At my day job I get to optimize several games for the Nintendo Switch
or the NVIDIA Shield,
some of them using Unreal Engine 4.
While UE4 is very powerful and offers a large selection of knobs to balance visual quality and performance some of the post-effects can end up being significantly heavy on a Tegra X1 GPU even at the lowest quality settings.
Below is a small collection of customizations/hacks I wrote in order to optimize the runtime cost of certain effects while remaining as close as possible to the original visuals of a vanilla UE4. The idea is to provide a drop-in replacement you can easily integrate into your own game to achieve better performance on a X1. Here I will be mainly writing about:
Depth-of-field techniques have seen a lot of changes recently in the latest UE4 versions, some getting deprecated in favor of the new DiaphragmDOF implementation. Historically UE4 supported 3 different approaches:
Here I will be writing about a drop-in replacement for BokehDOF called GatherDOF.
BokehDOF produces very pleasant visual results but the main issue is its bandwidth cost.
To give some idea of how BokehDOF operates, it basically spawns one bokeh sprite per original pixel of the scene, each sprite size being proportional to the circle-of-confusion value of the pixel it originates from.
(See also the MGS V graphics study for more in-depth insights, it’s using roughly the same method.)
For a 1080p scene that means drawing around 2 million quads, blended on the top of each other. As pixels get furthermore out-of-focus the sprite size increases and performance takes a nose-dive with quad overdraw saturating the bandwidth.
GatherDOF was implemented with the idea of producing a visual result close to BokehDOF but with a different approach: “gather” neighbor texel values instead of “scattering” bokeh sprites. The two algorithms are completely different — their cost as well — however the end result is quite similar visually: