Foreword
After my previous articles I started looking for another game to dive into and I ran into the demo version of Shadows of the Tomb Raider. I thought the rendering of this game is probably already really interesting but with the recent patch to enable raytraced shadows it made it an even better target. The PC port is made by Nixxes and they worked together with Nvidia to add this feature to the game. If you want to learn more about how they did it check out their GDC presentation from a few weeks ago.
Since I wanted to check the acceleration structures I again used PIX and NSight to look under the hood. This was quite fortunate because the game uses the D3D12 debug naming API for almost all resources so the resources show up with human readable names in PIX. This allowed me to make more assumptions about what is happening, which turned out essential because there are plenty of non-obvious steps that might be impossible to understand without this little bit of additional information.
So I started the demo, cranked all settings to the maximum and started capturing.
Frame breakdown
I took a few captures before I ended up choosing the one that takes place in the same square as the one shown in the RTX announcement. You can check the video here if you missed it. Below you can see the final rendered results of the frame I’m going to look into.
One thing to note here is that the game doesn’t use a traditional deferred renderer, it uses Forward+ with clustered lighting and it renders the geometry multiple times. This is an interesting tradeoff and with the raytracing there’s even one more geometry pass to render. All of these geometry passes are nicely ordered front to back with some small exceptions.
So let’s get started.
Since there is a lot happening in the frame and a large portion is not easy to visualize I will focus on the parts that are interesting to show. Just note that if you take a capture yourself you will see plenty of draw and dispatch calls I’m not mentioning separately. These are mostly updating buffers. One example is the clustered lighting which has quite a few steps but most of it is not possible to understand without having some extra debug functionalities built into the game which I have no access to. Other than these I will try to show how a frame is assembled.
An interesting thing about the raytraced shadows is that they require a separate (non-jittered) depth prepass. The frame starts out by rendering this, if you want to know the details of why this is needed you can check the presentation I mentioned.
After the depth prepass the depth buffer is changed and the game renders 2 textures. The first one is a regular RGBA8 texture with the vertex normals in the RGB channels and a sky mask in the alpha. The other texture is RGBA16F format and it’s later called a velocity texture but it’s not obvious how it stores the velocity. It uses some kind of compression. The alpha channel seems to contain some kind of object mask, probably to mark which objects have valid velocity data in the texture.
After we have the information about the depth and the normals the light clustering is executed (more about this later) and the shadow map atlas is filled with shadow maps for the lights that don’t cast raytraced shadows. This atlas fails to show in PIX and it doesn’t show properly in NSight probably because it’s a partially resident texture with 12288⨯8192 pixel size. Only a small portion of this texture is actually filled with data in any of the scenes I checked.
After the shadow maps, the shadow raytracing is executed. As I mentioned I’m not gonna describe the details of this but later I will show the acceleration structure.
After the lighting and shadowing is ready, another geometry pass is executed (the third one on PC). This pass outputs 3 textures:
- the HDR lit color of all the meshes rendered in this pass into an RG11B10 texture
- the albedo and roughness into an RGBA8 texture
- the normal mapped normals compressed into the RG channels, the Metalness in the B channel and something that is called AO later in the pipeline but it doesn’t really look like AO in the A channel
With all settings on maximum the next step is the SSAO generation, for which the game uses HBAO+. This is pretty well described elsewhere so I’m not going to spend time on it just showing the SSAO results here.
After the screen space AO, another pass is creating the screen space reflection texture. This is then composited onto the HDR color but it’s not that visible in this scene.
The next thing is the rendering of the forward elements. This is mostly the hair and eyes of all those characters. Also not that visible in this scene.
The final pass before the postprocessing is adding the transparent elements. This is done in two parts, the first part renders transparents into a half resolution buffer which is then composited back to the HDR texture and then it renders all the transparents that are marked to be full resolution.
Finally the post process stack is executed. This is really similar to other games. There’s the usual bloom, DoF, color grading, TAA, tonemapping, lens flare and camera dirt. I don’t want to bore anyone with the separate results. After the UI is added (not visible in this capture) the frame is finished by mapping the results to SDR if that is needed (I’m not equipped to check HDR right now).
As a little extra, here’s a visual comparison of the vertex normals and the uncompressed normal maps
As you can see there are a few interesting steps to achieve the look in the end. Now that we have a good understanding how the frame is assembled, let’s look at some details.
Skinning
There’s something that I noticed that I haven’t really seen anywhere else before. Skinning is done at the beginning of the frame via vertex shader only drawcalls writing into UAVs and outputting degenerate triangles.
This is an interesting idea and I wonder if it has any advantages over skinning in compute shader. Especially that it doesn’t use transform feedback which would be another way to get the transformed vertices into a buffer. I can’t really show this off on a picture because the results that would be interesting to us are written to UAVs and there’s no easy way to visualize UAV buffers in PIX or NSight.
The results of this skinning don’t seem to be used in the rendering of the passes I mentioned above, they are to update the acceleration structure as mentioned in the GDC presentation. The geometry passes do the skinning in the vertex shader every time.
Clustered lighting
Another pass that doesn’t render anything is the light clustering. The light shapes are also writing a UAV but this time from the pixel shader with no rendertarget bound. The shapes themselves are also interesting, instead of a more refined shape the game approximates spheres with an icosahedron (D20 for DnD players ;)) and all frustums used for cone lights are just boxes scaled differently on both ends.
Raytracing
As I mentioned at the beginning of this article there is a GDC talk about the raytraced shadows specifically. Since the methods are well described there I will only be presenting some details I think are really worth showing. First let’s see an overview of our example scene. This is the same scene that I used for the frame analysis above.
In this bird’s eye view it’s clearly visible that only a few meshes around the player are added to the acceleration structure.
Here’s the view from approximately the same place where the camera would be in the game. Unfortunately this image is not that clear to read, but matching it to the images in the frame breakdown should help.
LOD
In my previous article written about Metro: Exodus, I mentioned I couldn’t find any sign of LOD when it comes to raytracing. The objects far away from the player seemed to have the same amount of detail as the closest ones. I was happy to see that there are signs of LOD in Shadows of the Tomb Raider. For example the characters below look very similar, they probably use the same head mesh. The first one that is close to the player is fully detailed while the one below is on the other side of the square and it uses a lot less triangles. As described in the GDC presentation this is necessary also to avoid self shadowing.
Detail
On the other hand there are many meshes where it would’ve made sense to do some LOD if that has any impact on the raytracing performance.
For example in this image above the roof tiles have small cracks modeled which is really nice to look at but probably has no visible effect in the game.
Or these hanging wires quite far away from the player, they might contribute some shadows but the mesh complexity seems like an overkill.
A few more meshes that are really far away from the player (interestingly the crate is the same mesh as in the GDC video) but they don’t seem to have any LODs.
The only error I found in this small capture is the missing lower body and head for one of the characters sitting close to the player. I wonder if this is really how the acceleration structure looks like or this is an error in the visualization.
It is still a mystery if the performance of the raytracing could be improved by further mesh simplification. I hope we will see some results about this in the near future.
Lens flare
Finally, since lens flare solutions are fascinating, let’s look at how it’s done in this game. I’m using a different scene for this one. The rendering is done directly on the HDR color buffer, via a series of quads. The effect is really faint even in this high contrast scene but it’s quite pleasing.
Final words
I hope you enjoyed this quick look under the hood of Shadow of the Tomb Raider. If there is something about the rendering you would like to know more about or you have ideas for other games you would like to see analyzed please leave a comment or let me know via twitter.
7 thoughts on “Under the hood of Shadow of the Tomb Raider”
Hi, when you say:
”
The final pass before the postprocessing is adding the transparent elements. This is done in two parts, the first part renders transparents into a half resolution buffer which is then composited back to the HDR texture and then it renders all the transparents that are marked to be full resolution.
”
I wonder how they solve the situation that one full resolution transparent element is in the middle of half-res transparent elements as below:
camera -> h h h f h h h h
where h means half-res transparent element and f means full-res element.
Great article.
thanks.
This is a pretty standard problem which is usually ignored in solutions like this. The half res transparents are usually assumed to be further away or fuzzy enough that the difference won’t be noticeable. Smoke and mirrors 😉
Thank you!
Hi,
Thanks for this. A few questions come up.
I guess this is considered forward because it generate the final frame before post processing while submitting scene geometry, but it does look from your images that it generates a bunch of 2D buffers along the way. Can you confirm which of these (Albedo, normals, roughness, metalicity, etc.) actually hit GPU memory as part of the rendering pipeline and which are diagnostic views of your frame debugger or views you generated by editing shaders in the debugger for our elucidation and interest?
Do you see the engine generating low-res per-tile min and max depth after the z-prepass?
Thanks again!
A.
Hey,
It is actually a hybrid between forward and deferred. Objects closer to the camera are rendered in a deferred fashion while objects further away are forward. I’m sorry if this was not clear from the post.
I do not remember seeing a depth hierarchy pass.
Feel free to ask more if there’s anything else unclear.
Hi,
Thanks for the clarification :).
Do you think it is a true Forward+[1,2] (precisely the use of a compute pass to generate per tile depth min and max from the depth pre-pass which lights are then culled to before the main forward rendering pass), or some form of tiled/clustered shading in a forward renderer (for the forward rendered objects), just like Doom (2016) does and like the Unity and Unreal forward renderers do?
Then again, you show light bounding geometry, so are these light proxies rasterized to fragments in a graphics pipeline? That would be different again, more like how lighting is applied in old-school deferred shading, and even further from Forward+ where lights are culled in a compute shader and applied while rendering scene geometry in the main forwards pass.
I appreciate the work you put in. Looking forward to more dissections. Perhaps you can analyse some high end mobile games too.
Best,
Andrew
1. https://takahiroharada.files.wordpress.com/2015/04/forward_plus.pdf
2. https://dl.acm.org/doi/10.1145/2407746.2407764, https://sci-hub.tw/https://dl.acm.org/doi/pdf/10.1145/2407746.2407764
It’s not forward plus, it’s just regular forward as far as I can tell.
The light bounding geometry is only used for the meshes closer to the camera which are rendered with deferred.