Hey there, Travelers!
Welcome to our new development update—and buckle up because this one is about to be pretty long and different from other update posts we’ve shared so far!
This time we’d like to delve deep into the abyss of video game optimization and its technicalities—and answer some questions that keep popping up on our Steam forums and our Discord channel.
And yeah, we've tried to make this post as readable as possible for a non-technical audience!
BUT! Before we step into the world of game development, let us start with a HUGE request and an opportunity for those who can't wait for the 1.0 release.
To ensure our work and changes to the game meet your expectations, we're looking for eager beta testers to try out and provide feedback for the revamped tutorial and Horns of the South area! The tests are planned for the end of October. Please note that you will have to own Tainted Grail: The Fall of Avalon on Steam to participate in the beta test. If you'd like to join and share your opinion, please fill out the form below. For people who will hell us with feedback, we will also have small secret surprise as a thank you!
Registration form: https://forms.gle/kwEo7TsCNsL2iUiW8
Now, let’s get to the point of the update!
So, first off, our game is made using the Unity engine. Unity is a multipurpose engine that allows people to make pretty much everything—from Web applications, through VR and mobile applications, to PC or console games. It supports both 2D and 3D games, and it’s REALLY popular because getting something, anything, to work is very easy and fast—there are many tools you can use without having to write your own.
However, while built-in Unity systems and tools work well for most use cases, Tainted Grail: The Fall of Avalon isn’t exactly a standard use case of the Unity engine. Creating an open-world, first-person RPG requires a bit more... customized approach.
A game like Tainted Grail: The Fall of Avalon poses various complex challenges, like drawing dozens of NPC’s, 150 to 300 thousand instances of assets, and all the logic that actually makes the game work, all at the same time. As you can imagine, these aren’t exactly “standard challenges” for the Unity engine. Or most game engines, for that matter!
So, what we’re constantly doing is adding new sets of customized tools and solutions to Unity in order to make our game run better.
But before we get to these tools and solutions, let’s consider the Magic Formula:
STABLE 60 FRAMES PER SECOND
Every gamer knows that a stable 60 fps is the magic number that every game should aim to achieve. We—as developers and gamers ourselves—agree with this sentiment, and we laugh out loud whenever we hear that we can’t spot the difference between 24-30 fps and 60. We can. You can. Everybody does, and 60 fps just FEELS better, period.
However, when you’re working on a video game, especially while you’re trying to make it run better, you quickly stop thinking in frames per second. In development, it is, essentially, a useless number because it’s the outcome, and not the solution.
As a dev, you quickly learn to think in terms of technological “cost” or technological “expenses”. If you were to hear our conversations, you’d hear sentences like “this and that is very expensive” or “no, this will cost too much”, pretty much every single day.
What we mean by that is that we have a certain technological “budget” for stuff that can happen in a single frame of the game, and this budget is our Magic Formula:
16,6 MILLISECONDS
This is how much we have to calculate the next game state and render it in order to achieve 60 fps on recommended specs.
As a point of comparison, one blink of a human eye takes 100 milliseconds.
This means that everything that we do on the technical side, the things we’ve already mentioned—like drawing dozens of NPC’s and 150-300 thousand instances of assets, and all the logic that actually makes the game work—needs to happen constantly, within 16,6 milliseconds.
Now, as you can imagine, everything we add to the game has a technological cost. So what we’re constantly doing is an act of balancing: we constantly want to ADD MORE AND MORE STUFF TO THE GAME and MAKE IT BETTER and PRETTIER but at the same time we need to hold our horses and always refer to the magic number—the damned 16,6 ms.
So, with all that out of the way, one last note: whenever we say that “System X gives us 2ms,” it means that we’ve reduced the cost of something by 2 ms. Without that particular system, which in this example gives us 2 ms, we would hit 18,6 ms, and this translates directly to 53 fps instead of 60. And that’s what we don’t want.
The essential tl;dr of this post is this:
- We are continuously working on the optimization of our game.
- Since our Early Access release, we have filled the world with many new objects and still managed to improve performance at the same time (the last patch runs at 10-15 fps more than the Early Access release).
- Now, only 4 months after our last patch, we have again internally managed to find a few fps more and reduced memory usage with features like Kandra (you can read about them in the rest of this update).
- We hope to deliver as good optimization for the final release as possible. It’s a very difficult and tedious process, and we’ll try to bring it closer to you with this post, describing our challenges and biggest accomplishments in this area in the last 2 years.
If you want more information, please read on!
Challenges
There are three main challenges we face while optimizing Tainted Grail: The Fall of Avalon.
The main challenge we have on high-end PCs is heavy CPU usage.
This is caused by various CPU-sided calculations—the rendering pipeline, animation system, physics, or game logic. It means that at some point, GPU doesn’t matter anymore, and RTX 4090 won’t do better than RTX 4070 since the bottleneck lies elsewhere—in a single core of the CPU.
Yes, single core—most game engines, Unity included, don’t handle multithreading very well.
The second challenge that we have to keep in mind is RAM+VRAM usage.
It requires good management of textures, models, animations, audio, etc.
The last challenge is not to overwhelm the GPU.
It’s easy to add better effects and shaders, which will work fine on higher-end rigs but totally destroy machines at the opposite end of the spectrum. So we need to make improvements to the places above, and at the same time, we continuously have to watch all other metrics.
Our Approach To Handling Them
First, we have to explain something about the Unity Engine.
The default, basic systems of Unity Engine are called GameObjects and Transforms. You can think of them as containers: they’re used for storing information about space and positions of data that’s required by other systems. This means that plenty of stuff is referring to these basic systems—but, even though they’re very convenient and great for simpler games, it turns out that they are unfortunately very expensive, especially when counted in thousands.
So by not using these two fundamental building blocks, we can improve performance. The issue starts with the fact that many other systems that Unity provides are fully based on them. This means that, without writing our own implementations of many GameObject-based systems, we cannot get rid of this dependency. Obviously we do not have the resources to rewrite everything with our team size, so we need to pick and choose which systems to rewrite and slowly go forward.
In many cases, optimizations force us to rewrite or remove various dependencies on default systems of the Unity Engine and use newer systems that Unity provides (ECS, BRG). We’re very happy whenever we get a new shiny toy from them, and we keep improving our backend to use ANYTHING that we’re given; we’re trying to stay as “current” as possible.
But, sometimes, we have to invent stuff on our own.
And here we finally got to the main part of this post: an explanation and showcase of systems we’ve been working on for the past two years in order to optimize the game.
Some of these systems are already in the game. Others are being written as we type this because, as we’ve said before, we’re CONSTANTLY working on optimization.
Systems We’ve Already Implemented
Vegetation: Leshy
You can’t have a dark fantasy game without trees, grass, and other types of vegetation. And obviously, the more of it you want to have in the game, the more expensive it becomes. That’s why we created Leshy—our own, custom way of rendering vegetation. The name comes from Slavic mythology; it’s a tutelary deity of the forests ;)
Non-technical tl;dr ->
- It’s a custom way of rendering all vegetation in-game, and it’s multiple times faster than any other available solution.
Technical:
- At the start of development, we decided to use a store-bought package for displaying vegetation. But, in time, we found that it did not scale well with a rising amount of vegetation types, which we decided we needed. It also relied on outdated systems and methods. And last but not least, packages have poor streaming capabilities, which is very important for the second challenge we mentioned (RAM/VRAM usage).
- This forced us to rewrite the runtime part of the system to use the new Unity API: BatchRendererGroup, as well as make a translation layer between the asset editor flow and the new runtime implementation.
- Now, all vegetation in-game is rendered by our own solution, which manages changing LODs (Level of Detail), loading and unloading assets from memory, culling, etc. It’s multiple times faster than the best solutions offered on the Asset Store or built-in Unity.
- The streaming was also fully remade with performance and memory as the main focus.
Leshy gives us about 1-1.5 ms (about 9% of our total budget) and requires under 30MB (RAM+VRAM combined while the old solution required ~500MB) while allowing significantly more species of foliage.
[previewyoutube=-rf8i5bYm-w;full][/previewyoutube]
Static Repetitive Objects: Medusa
If you played Tainted Grail already, you probably noticed that we have a lot of ROCKS in the game. They’re a part of what’s known as a “static environment”, so, in essence, things that don’t move or don’t change but are constantly there.
The fact that they’re constantly in the background means that they, too, can become expensive because your computer needs to take them into consideration whenever it’s rendering a new frame or whenever you move your character. So even though they’re just there and don’t move and don’t change, they’re costly.
That’s where our custom solution for the static environment comes into play—we called it Medusa because, just as Medusa in Greek mythology used to change people into stone, we use this tool mostly for rocks, and we want to keep them in place :D
Non-technical tl;dr ->
- In open-world games, you will find many assets repeated in hundreds, if not thousands of places. Rotated, scaled, and fit in so many different ways that they blend perfectly with the environment.
- Assets that are everywhere in the game and don’t move (like rocks) can be rendered really fast (or in a cheap way), but it requires another big system that makes best use of Unity’s rendering capabilities.
Technical:
- By default, such objects are still quite expensive because of GameObjects and many calculations on the Unity side that can be stripped, knowing their (objects) limitations.
- Medusa renderer collects data about such objects, destroys respective GameObjects, and takes care of rendering them by itself, using, again, BatchRendererGroup.
- Since they are repetitive, there is no need to continuously stream in and out associated assets. This way, we achieved probably the fastest possible way of rendering that is available in Unity.
Medusa gives us about 1-1.5 ms (about 9% of our total budget).
[previewyoutube=kPbsB82Tp14;full][/previewyoutube]
Other Objects: Drake
Now this one is tricky. So we have Leshy for vegetation and Medusa for static objects. That’s all great, but it turns out that in order to be optimized, we needed something similar to them for everything else. That’s why we came up with Drake.
(Why is it called Drake? Mostly because we didn’t want to call it a Dragon because it would be tacky. What does it have to do with Dragons, though? Well, as the main brain behind this system once said: “Dragons guard gold, and this system is gold”.)
Non-technical tl;dr ->
- We already have vegetation and static rocks that build the overall world's static environment. However, to fill it with content and details, we needed another system, called Drake.
- So, essentially, it’s very similar to Medusa and Leshy, so we won’t repeat the explanation, but it’s used for other types of objects.
Technical:
- It utilizes Unity ECS and Addressables to manage resources and to render them.
- By default, ECS doesn't work with resources and mipmaps streaming, so we needed to implement it in-house.
- ECS rendering is slower than Leshy and Medusa but allows for more dynamic operations (Leshy needs to be baked and static as well as Medusa) and gives more flexibility.
- Also, streaming is another hit for the CPU, but a big relief for RAM+VRAM.
Drake gives us about 1.5-2 ms (about 12% of our total budget) and is 700MB easier on RAM.
Huge Structures: HLODs
(hierarchical level of detail)This is another one of our custom solutions, and this time it has no fancy name—just an abbreviation. If you come up with something cool, please let us know in the comments!
In order to understand HLODs, we have to start with LODs.
When you look at an asset in a game from a certain distance, you don’t have to see it in its full quality—it would be VERY expensive to render fully detailed stuff that’s hundreds of meters away. What you’re actually seeing is a particular Level of Detail.
Each asset can have various Levels of Detail (LODs), which change its quality depending on the distance from the player. They should switch seamlessly and should be prepared and setup so that you never notice the change—but taking into consideration how many assets need them, it’s rather tricky and difficult to set it up 100% properly.
And if we take this idea and apply it to the multiple objects that are next to each other, we have Hierarchical Level of Detail, or HLODs. Hierarchies and groups: plenty of stuff dynamically changing its quality depending on how well you can see it at a particular distance.
Non-technical tl;dr ->
- We’re merging objects into bigger objects, and they get their hierarchical level of detail so that everything is even cheaper.
- However, we can’t merge objects and simplify these hierarchies at the asset level because it would prolong artists' jobs by the order of hundreds, so it needs to be automatic.
Technical:
- Hierarchical Level of Detail means packing structures in the same areas in hierarchical trees and baking simplified meshes for each node.
- When we move away enough, a given node replaces its contents (many little objects) with one big mesh that is significantly cheaper to render.
- At the same time, it is cheaper to make visibility calculations for 1-4 objects than for thousands.
- The last part is streaming (again); distant objects are less detailed so we can use less detailed (smaller) resources.
HLODs give us about 2 ms (about 12% of our total budget); the number is getting higher as your GPU goes lower spec. (In future releases we will also implement VRAM optimizations based on HLODs; stay tuned.).
[previewyoutube=wT36a0YYBDI;full][/previewyoutube]
Critters:
giving life to the worldIf you’re working on a video game in Unity, you’ll quickly realize that
Animation systems are EXPENSIVE.
Rendering animated characters is EXPENSIVE.
So when we wanted to put some crabs on the beach, we noticed that this, too, needed to be rewritten and optimized on our end.
Non-technical tl;dr ->
- We have created a custom way of animating critters so that a couple of crabs on the beach won’t destroy the game’s fps.
Technical:
- We have baked animations into textures and applied them to skinned mesh in shader (VAT: vertex animation texture); that way we completely bypass Unity’s animation system and SkinnedMeshRenderer.
- As a cherry on top, we moved them to ECS, which means other improvements against Unity’s GameObjects flow.
[previewyoutube=YBz6jK-3yM0;full][/previewyoutube]
Last but not least, an in-progress system...
Character rendering: Kandra
Our custom solution—with a name taken from the novels of Brandon Sanderson—Kandra.
Kandra was a creature that could absorb someone’s bones and skin to impersonate them, and... that’s kind of how it works in our game, too!
This is our newest addition to the game, and it’s a system that’s still being written at this moment, but we’re extremely excited about it because the performance boost is substantial!
Non-technical tl;dr ->
- It renders characters (all NPCs, monsters, etc.) in a way that is cheaper on the CPU side and uses a lot less VRAM than Unity’s default way.
- You can think of it as compressing a huge .wav file into an .mp3 :D
Technical:
- That one is very simple, but at the same time, it is very difficult, complex, and technical. So in order not to make this update even longer, let’s say that:
- There are certain things that we use, such as skinned meshes (mesh in which every vertex changes its position and other attributes with respect to changes to its skeleton’s bones), blend shapes (deformations that contain displacement of each vertex of the mesh), and clothes culling (which allows us to hide certain parts of the mesh). For various technical reasons, they generate A LOT of data.
- There’s also something called Compute Dispatch which is essentially related to the way in which, in certain cases, the CPU transfers data and communicates with the GPU, and both of these operations are SUPER SLOW.
- For various technical reasons, Unity forces us to store the same data multiple times, and the data is uncompressed, so the result is high memory consumption as well as insufficient CPU and GPU usage.
- Kandra stores this data in a highly compressed version and makes communication and data transfers between CPU and GPU MUCH FASTER.
Results of our internal tests. It shows the cost of rendering 150 NPCs on an empty scene.
Conclusion
We’ve come to a point where we use Unity’s out-of-the-box rendering only for VFXs.
We have our own systems that make better use of Unity’s capabilities for pretty much everything else in the game. This allows us to create an open-world, first-person RPG in Unity with as few loading screens as we managed to.
There are still many things that could be improved, like the animation system, physics, parts of game logic, HLODs setup, streaming is far from perfection (and obviously the assets themselves can be improved), etc.
As we move forward, new things appear, and we have to maintain our systems and take care of new challenges at the same time.
But in the end, you should see major performance boosts with every major patch that we release, and we plan to continue this effort.
Measurements were done on PC with:
- AMD Ryzen 7 3800X—the lower end of the medium bucket
- RTX 3080TI—the lower end of the high bucket
That’s it for now! Tell us what you think about those types of technical updates. If you find it interesting, we might share some more details in the future. For now, stay tuned for our upcoming content update. It will be a bit delayed, but we’re sure it’ll be worth the wait!
Stay safe, Travelers!
https://store.steampowered.com/app/1466060/Tainted_Grail_The_Fall_of_Avalon/