This time I’m going to look at a game that came out a long time ago. The possibility of a new Batman game arriving soon made me want to go back and finish the Arkham series. This last installment was released in mid 2015 so I waited almost 5 years to do this. The reason this game might be interesting is that it’s one of the last big games made on Unreal Engine 3, or rather an engine that was a heavily modified version of it. I’m not sure if we would get to see the sources we would recognize much of UE3. At least not on the rendering side. The engineers at Rocksteady and WB Games Montréal probably worked really hard to keep the engine up to date. After all, at the time when Arkham Knight was released the engine was already 9-10 years old (depending on how it’s counted).
Arkham Knight unfortunately has a pretty bad reputation for its streaming issues. Even on a PC built nowadays from high end components this can be felt during the gameplay. Especially in driving sequences which are completely new to this installment. Other than that the gameplay is pretty much the same as in the previous ones. Fight bad guys, solve crimes, jail villains. The stakes are higher than ever before as you would expect from the last game in the series.
Fortunately there was no issue with capturing the game this time. RenderDoc worked out of the box and I was able to comfortably gather a few different captures that showcase different aspects of the rendering pipeline.
Before jumping into the thick of it I would like to point out a few things that surprised me a bit about this game.
First of all even though the game uses an engine from the DX9 era (updated to use DX11) it actually uses many features exclusive to DX11. In a frame where nothing special is going on I counted 44 Dispatch calls. That’s quite a lot even for games running on more modern codebases. The game also uses deferred contexts to render the world. This is probably to avoid the bottleneck of preparing the command lists on a single thread.
The second thing I would like to point out is that there are a lot of drawcalls where the vertex count is really high. The environment is filled with all kinds of industrial elements like steel bars, pipes and the like and these don’t seem to have any efficient LOD scheme.
As an example I highlighted the drawcall for the clock faces of the tower of Gotham City Police Department
Now let’s look at the mesh that was rendered on this tiny area of the screen.
This mesh has 46,068 indices.
This is just one example of the many meshes that could easily do with some kind of efficient LOD scheme, and would probably make the game render faster. There are two reasons I can think of to leave it like this. One is if the CPU is the bottleneck anyways and we don’t want to burden it more with LOD calculations. This doesn’t really work on PC, because you can’t know if another hardware configuration will have the CPU as a bottleneck or not. The second one would be if the LOD would take more memory and that would make the budget of the game explode. This is certainly a tradeoff that might have had to be made in this case.
The game does use mesh proxies for things that are really far away. These mesh proxies also have very high vertex counts in general. Many of them suffer from the same issues where the mesh is way too detailed for the area that it covers on the screen. For example railings and such thin meshes are kept as part of the mesh instead of discarding them. This suggests that these meshes are also created by merging several LOD0 meshes which is not ideal for performance.
The game also renders most of the meshes in a seemingly random order but this order in many cases ends up rendering things that are later rendered over at least partially. This means that a lot of pixels are shaded more than once (more on this later).
There are many different views in Arkham Knight but I chose a pretty iconic one that happens very frequently in the game. This is good to demonstrate the size of the open world but I will use some other captures throughout my analysis to show some specific aspects of the rendering.
So let’s see how we will get to this impressive result.
Every frame I captured started with an update of something that looks like an atlas of environment/light probes. There are maximum 4 different textures used by these probes, corresponding to some kind of deferred setup. There is a depth texture, an albedo, a normal map and one for storing two different depth textures most likely for shadow calculations (maybe something like VSM, but each channel normalized separately). Some of the probes seem to be using only the last texture so most likely those are just shadowed local lights. The textures are 8k and they contain differently sized tiles probably depending on the distance from the probe or some similar heuristic.
Most of the Dispatch calls are actually coming from this first part of the rendering because every update does a couple draws to a few much smaller textures and then uses compute shaders to copy the updated part of the content to the atlas. I wonder if this ping-ponging between draw and dispatch is actually worth it. There is a certain cost that needs to be paid in switching to compute and back, and depending on the scenario this cost might cancel out any benefit.
After all the probe updates the main part of the rendering begins which is filling the GBuffer. The setup for this is slightly different than other games I’ve seen.
First of all there is an R11G11B10 texture for emissives (more on this in a second). Then there is one with R10G10B10A2 where the normals are stored. I couldn’t find any capture where anything would be stored in the alpha channel but maybe I was just unlucky. Then there is an albedo texture with R8G8B8A8 format. The alpha is used for some kind of mask, differentiating meshes (seemingly based on skinned and destructible properties). The last color texture is also R8G8B8A8 and it stores different material properties. Finally there is a depth buffer. This is the basic setup for most draws but in some cases there are some extra rendertargets. I will discuss those later.
The interesting thing about this setup is that the game renders a lot of things in the emissive texture. Most of the objects in the distance are rendered only there and lit during GBuffer filling. This means that the game really uses a hybrid approach between forward and deferred rendering, depending on the distance. This can be seen below where I brightened the emissive texture to reveal all the content that is actually rendered into this rendertarget. Since everything happens in one pass and thus uses the same depth buffer there is just blackness where the foreground objects would be rendering.
In fact as mentioned before, many of the meshes are rendered in a way that they are rendered over later, so those parts where the foreground objects are hiding the forward shaded objects are actually rendered over with just pure black, probably wasting quite a bit of performance in the process.
The additional 2 rendertargets that are getting rendered by some of the objects in this pass are a screen space velocity target with R16G16 format. And an R16 texture that I haven’t found any usage for later in the frame.
The GBuffer filling is finished off with rendering the water surface seen surrounding the three islands of the game. There doesn’t seem to be any culling for these because they pretty much always render, even if the water is not visible in the view. The water is also extremely detailed even in the distance. When Batman is standing on top of a building (as in the capture I’m showing the screenshots from) there are 14 drawcalls for the water with over 77000 indices each plus a much smaller drawcall for a skirt mesh to hide the edges. As an example see the furthest water mesh below with 77844 indices.
And with that the GBuffer is finally filled.
The next pass is for occlusion queries, this is probably to prepare visibility information for the next frame or more likely 2 frames down the line. The occlusion queries seem to be hierarchical, first querying for boxes and then querying for 16 boxes in one drawcall. Occasionally there is also a 4 or 8 box batch but I only ever found one of those so probably it’s only for some overflow. Since the occlusion queries don’t actually write any color I’m not going to add a screenshot for this pass.
For these next passes I will use a different capture because it illustrates the point much better but the pipeline is the same in both cases so bear with me and we will be back to the vista.
After the queries there is a quick pass for rendering the planar reflection of Batman. This is later used to block out reflections. The texture for this is quarter resolution and it has R8 format so it’s really just a mask, no actual reflection colors rendered, but I guess with Batman’s armor and cape being black there is nothing lost. In most cases the camera is angled in a way that this silhouette in the reflection is not visible so as I mentioned above I’m gonna use another capture to demonstrate this.
After the reflection texture, there are a couple drawcall for skin and eyes and we get to the lighting phase.
The lighting is also done in compute shader. There is one dispatch call that calculates the diffuse and specular lighting into the two halves of a texture. This texture has the same resolution as the final output so the lighting is happening in half resolution.
Then there are two compute shaders that use this lit texture and the GBuffers to scale up the lighting information in a checkerboarded way.
Now let’s get back to my original capture for the rest of the frame. We left off before the reflection and lighting, so let’s see how the image looks after those for our rooftop viewpoint.
I had to considerably brighten the image to see something meaningful, otherwise this image would be almost completely black. This is part of why I had to choose another capture to demonstrate the lighting.
After the lighting there are a couple more objects rendered into the velocity buffer and finally we get to rendering the transparencies. These are rendered directly on top of the opaque scene, there doesn’t seem to be any trick to optimize them.
2 interesting draw calls in this pass are the rain particles. There are 10240 raindrops and the same amount of splashes. For these the vertex buffer is generated by two compute shaders which use the values from the depth and normal textures and also a heightmap of the game world.
After the rain there is a pass that renders volumetric lights, this is achieved by rendering into an R32 float texture the light contribution and then blending this onto the lit texture. Some lights also have a few billboards attached to them to serve as flare.
When all the transparent objects finished rendering, we are ready to start the post process pipeline, but first let’s see where we are. Again I have to mention that I had to brighten this texture considerably.
Unsurprisingly the post process phase also uses a lot of compute shaders mixed with regular pixel/vertex shader techniques. This phase starts with an edge detection based antialiasing. Five years ago the temporal techniques were already quite popular but probably it was cheaper to use a well optimized earlier solution. The edge detection texture is of R8G8 format where red seems to contain the vertical edges and green the horizontal ones.
This texture is further processed and then used by a compute shader to output the antialiased image. Let’s see it demonstrated on Batman’s ears (brightened even more to make it clear).
After the AA, there is a compute down and upsample happening, with one interesting drawcall in the middle. After the texture has been downsampled all the way to 8×5 there is a drawcall with a lot of vertices to create some more lens flares. This is an interesting way of doing this as they have to find where the flare mesh needs to be and also how big it should be based on the brightness of the light source. I will demonstrate this with yet another capture because it’s not very visible in the rooftop one.
This texture is then used in a compute shader that adds bloom, lens flares, tone mapping and color grading all in one go.
The same state for our rooftop scene looks like this. This finally doesn’t require any brightening because the tone mapping already took care of it.
After this there’s only a couple UI drawcalls and we arrive to the final image already seen at the beginning.
Phew that was quite a ride. I hope the usage of multiple captures didn’t make this analysis hard to follow.
Previously I only checked very recent games, but this time I chose something older. Even though the game is not new, the steps of the rendering can still be interesting. The results speak for themselves and the city presented by the game is filled with atmosphere. Hopefully we will see a new Batman game soon that will be at least as attractive as this one, both from the art and the tech perspective. I’m sure the engineers learned a lot from making this one.
As always, if there is something more you would like to know or you would like to see your favorite game analyzed, please leave a comment or let me know via twitter.