UE4 Optimized Post-Effects

At my day job I get to optimize several games for the Nintendo Switch or the NVIDIA Shield, some of them using Unreal Engine 4.
While UE4 is very powerful and offers a large selection of knobs to balance visual quality and performance some of the post-effects can end up being significantly heavy on a Tegra X1 GPU even at the lowest quality settings.

Below is a small collection of customizations/hacks I wrote in order to optimize the runtime cost of certain effects while remaining as close as possible to the original visuals of a vanilla UE4. The idea is to provide a drop-in replacement you can easily integrate into your own game to achieve better performance on a X1. Here I will be mainly writing about:

Bokeh Depth-of-Field

Depth-of-field techniques have seen a lot of changes recently in the latest UE4 versions, some getting deprecated in favor of the new DiaphragmDOF implementation. Historically UE4 supported 3 different approaches:

Here I will be writing about a drop-in replacement for BokehDOF called GatherDOF.

BokehDOF produces very pleasant visual results but the main issue is its bandwidth cost.
To give some idea of how BokehDOF operates, it basically spawns one bokeh sprite per original pixel of the scene, each sprite size being proportional to the circle-of-confusion value of the pixel it originates from.
(See also the MGS V graphics study for more in-depth insights, it’s using roughly the same method.)

BokehDOF Sprite

For a 1080p scene that means drawing around 2 million quads, blended on the top of each other. As pixels get furthermore out-of-focus the sprite size increases and performance takes a nose-dive with quad overdraw saturating the bandwidth.

GatherDOF was implemented with the idea of producing a visual result close to BokehDOF but with a different approach: “gather” neighbor texel values instead of “scattering” bokeh sprites. The two algorithms are completely different — their cost as well — however the end result is quite similar visually:

( TX1@768MHz )
( TX1@768MHz )

Metal Gear Solid V - Graphics Study

The Metal Gear series achieved world-wide recognition when Metal Gear Solid became a best-seller on the original PlayStation almost two decades ago. The title introduced many players to the genre of “tactical espionage action”, an expression coined by Hideo Kojima the creator of the franchise.

Though in my case the first time I played as Snake wasn’t with this game but with the Ghost Babel spin-off on GBC, a lesser-known but nevertheless excellent title with an impressive depth.

The final chapter Metal Gear Solid V: The Phantom Pain was released in 2015 and brings the series to a whole new level of graphics quality thanks to the Fox Engine developed by Kojima Productions. The analysis below is based on the PC version of the game with all the quality knobs set to maximum. Some of the information I present here has already been made public in the GDC 2013 session “Photorealism Through the Eyes of a FOX”.

Dissecting a Frame

Here is a frame taken from the very beginning of the game, during the prologue when Snake tries to make his way out of the hospital. Snake is lying on the floor trying to blend in among the other corpses, he’s at the bottom of the screen with his naked shoulder. Not the most glamorous scene but it illustrates well the different effects the engine can achieve.

Right in front of Snake two soldiers are standing up, they’re looking at some burning silhouette at the end of the hallway. I’ll simply refer to that mysterious individual as the “Man on Fire” not to spoil anything about the story.

So let’s see how this frame is rendered!

Depth Pre-Pass

This pass renders only the geometry of the terrain underneath the hospital as viewed from the point of view of the player and outputs its depth information to a depth buffer. The terrain mesh is generated from the heightmap you can see below: it’s a 16-bit floating point texture containing the terrain elevation value (view from the top). The engine divides the heightmap into different tiles, for each tile a draw call is dispatched with a flat grid of 16x16 vertices. The vertex shader reads the heightmap and modifies on-the-fly the vertex position to match the elevation value. The terrain is rasterized in about 150 draw calls.

G-Buffer Generation

MGS V uses a deferred renderer like many games of its generation, if you already read the GTA V study you will notice several similar elements. So instead of calculating directly the final lighting value of each pixel as the scene is rendered, the engine first stores the properties of each pixels (like albedo colors, normals…) in several render targets called G-Buffer and will later combine all this information together.

All the following buffers are generated at the same time:

G-Buffer Generation: 25%
G-Buffer Generation: 50%
G-Buffer Generation: 75%
G-Buffer Generation: 100%

Beware of Transparent Pixels

If you use sprites with transparency in your game — and you most likely do for the UI at least — you’ll probably want to pay attention to the fully transparent pixels of your textures (or should I say “texels”).

Even with an alpha of 0, a pixel still has some RGB color value associated with it.
That color shouldn’t matter, right? After all if the pixel is completely transparent, who cares about its color…

Well this color does actually matter, getting it wrong leads to artifacts which are present in many games out there. Most of the time the corruption is very subtle and you won’t notice it but in some cases it’s really standing out.

Corruption Example

Time for a real-world example! Here is the XMB of my PS3 (home menu), browsing some game demos installed.
At first Limbo is selected, I just press “Up” to move the focus to The Unfinished Swan (both great games by the way).

Press up.
Limbo moves down.
White background
fades in.

Do you see what’s happening in the Limbo logo area?
The Unfinished Swan white background fades-in and we end up with the Limbo logo — pure white — drawn on the top of a background which is also pure white. Everything should be completely white in that area, so why do we have these weird gray pixels?

The corruption most likely comes from the Limbo texture using wrong RGB colors for its fully transparent pixels.

Texture Filtering

The artifacts are actually due to the way the GPU filters a texture when it renders a sprite on the screen. Let’s see how this all works with a simple example.

Here is a tiny 12x12 pixel texture of a red cross:

On the left is a zoomed view, the checkerboard pattern is just here to emphasize the completely transparent area with an alpha of 0.
You could use this sprite as an icon to display health points in your UI or as a texture for some in-game med-kit model… (No wait! You don’t want to do this actually!)

Let’s make three versions of this sprite, by changing only the color value of the pixels with an alpha of 0.

(You can download the image files and do some color picking to confirm the RGB value of the transparent pixels)

These 3 sprites look exactly the same on your screen, right? It makes sense: we only modified the color value of the transparent pixels, which end up being invisible anyway.
But let’s see what happens when these sprites are in motion. Here is a zoomed view to better see the screen pixels:

Well that’s quite some corruption we have here! Some brown tint for the first sprite, purple tint for the second one.
The 3rd one… is correct, it’s how it’s supposed to look.

Let’s focus on the blue version:

As you can see the issue happens when the texture is not perfectly pixel-aligned with the screen pixels. This can be explained by the bilinear filtering the GPU performs when rendering a sprite on the screen: when sampling a texture, the GPU averages the color values of the closest neighbors of the coordinates requested, both in the vertical and horizontal direction.

So if we consider the case where the sprite is misaligned by exactly half a pixel:

Each pixel of the screen will sample the sprite texture right in the middle between 2 texels. This is what happens with the pixel you see on the left: it fetches the sprite texture in the middle between a solid red texel and a transparent blue texel. The averaged color is:

$matht$ 0.5 * [ [\color{#ff2c2c}{1}], [\color{#00c300}{0}], [\color{#2f9fff}{0}], [1] ] + 0.5 * [ [\color{#ff2c2c}{0}], [\color{#00c300}{0}], [\color{#2f9fff}{1}], [0] ] = [ [\color{#ff2c2c}{0.5}], [\color{#00c300}{0}], [\color{#2f9fff}{0.5}], [0.5] ] $matht$

Which is some translucent purple looking like this:

DOOM (2016) - Graphics Study

DOOM pioneered fundamental changes in game design and mechanics back in 1993, it was a world-wide phenomenon which propelled to fame iconic figures like John Carmack and John Romero

23 years later, id Software now belongs to Zenimax, all the original founders are gone but it didn’t prevent the team at id from showing all its talent by delivering a great game.

The new DOOM is a perfect addition to the franchise, using the new id Tech 6 engine where ex-Crytek Tiago Sousa now assumes the role of lead renderer programmer after John Carmack’s departure.
Historically id Software is known for open-sourcing their engines after a few years, which often leads to nice remakes and breakdowns. Whether this will stand true with id Tech 6 remains to be seen but we don’t necessarily need the source code to appreciate the nice graphics techniques implemented in the engine.

How a Frame is Rendered

We’ll examine the scene below where the player attacks a Gore Nest defended by some Possessed enemies, right after obtaining the Praetor Suit at the beginning of the game.

Unlike most Windows games released these days, DOOM doesn’t use Direct3D but offers an OpenGL and Vulkan backend.
Vulkan being the new hot thing and Baldur Karlsson having recently added support for it in RenderDoc, it was hard resisting picking into DOOM internals. The following observations are based on the game running with Vulkan on a GTX 980 with all the settings on Ultra, some are guesses others are taken from the Siggraph presentation by Tiago Sousa and Jean Geffroy.

Mega-Texture Update

First step is the Mega-Texture update, a technique already present in id Tech 5 used in RAGE and now also used in DOOM.
To give a very basic explanation, the idea is that a few huge textures (16k x 8k in DOOM) are allocated on the GPU memory, each of these being a collection of 128x128 tiles.

16k x 8k storage with 128 x 128 pages

All these tiles are supposed to represent the ideal set of actual textures at the good mipmap level which will be needed by the pixel shaders later to render the particular scene you’re looking at.
When the pixel shader reads from a “virtual texture” it simply ends up reading from some of these 128x128 physical tiles.
Of course depending on where the player is looking at, this set is going to change: new models will appear on screen, referencing other virtual textures, new tiles must be streamed in, old ones streamed out…
So at the beginning of a frame, DOOM updates a few tiles through vkCmdCopyBufferToImage to bring some actual texture data into the GPU memory.

More information about Mega-Textures here and here.

Shadow Map Atlas

For each light casting a shadow, a unique depth map is generated and saved into one tile of a giant 8k x 8k texture atlas. However not every single depth map is calculated at every frame: DOOM heavily re-uses the result of the previous frame and regenerates only the depth maps which need to be updated.

8k x 8k Depth Buffer
(Previous Frame)
8k x 8k Depth Buffer
(Current Frame)

When a light is static and casts shadows only on static objects it makes sense to simply keep its depth map as-is instead of doing unnecessary re-calculation. If some enemy is moving under the light though, the depth map must be generated again.
Depth map sizes can vary depending on the light distance from the camera, also re-generated depth maps don’t necessarily stay inside the same tile within the atlas.
DOOM has specific optimizations like caching the static portion of a depth map, computing then only the dynamic meshes projection and compositing the results.

GTA V - Graphics Study - Part 3

Post Processing Effects

Post-effects are performed once the scene has been rendered, to enhance, fix artifacts, change the mood…
We saw in Part 1 how a few post-effects were applied, like bloom, anti-aliasing or tone-mapping. But there are several other effects used in GTA V.

Lens Flares & Light Streaks

When the light goes through a real-world lens, the scattering and internal reflections sometimes cause artifacts.
What I call “lens flares” here is a collection of bright spots along the axis defined by a bright light source and the center of the screen. There are also “light streaks” which are rays originating from the light source. These artifacts are very common in movies, when they are added to a game frame they can give a kind of “cinematic” feeling.

Lens Flares & Light Streaks

There are usually 2 ways to render such artifacts:

  • image-based: extracting the brightest areas, duplicating and deforming them. Works for any number of bright light sources.
  • sprite-based: adding textured-sprites and managing their positions manually. Each light source must be handled separately but artists can have more control over the artifact shape, color, intensity…

GTA V actually uses both techniques: the image-based approach is used to add a subtle blue halo at the bottom-left corner of the image, it’s actually a symmetric of the bright-pass buffer. But the most visible artifacts in this scene result from the sprite-based approach, it is applied for the sun only. First, light streaks are added by rendering 12 rotated quads centered around the sun. Then for the lens flares, 70 sprites are drawn along the axis “sun – screen center”. Artifacts come closer to each other as the camera points towards the sun.

Base + Light Streaks
Base + Light Streaks + Lens Flares
Base + Light Streaks + Lens Flares (Wireframe Highlight)

There are several sprites the engine uses to simulate different lens artifacts:

Base + Flares
Base + Flares (Wireframe)

GTA V is all about the attention to details, and lens flares are no exception: their size is proportional to the aperture of the camera. So if you suddenly look towards the sun, the lens flares are big at first, but then as the aperture narrows to lower the exposure the lens flares become smaller too. The animation below illustrates the phenomenon.
Another nice detail: if you switch to the first-person view, there are barely any lens-flares visible, because we are now seeing through human eyes, not through a camera anymore.

Large Aperture
Small Aperture