The Metal Gear series achieved world-wide recognition when Metal Gear Solid became a best-seller on the original PlayStation almost two decades ago. The title introduced many players to the genre of “tactical espionage action”, an expression coined by Hideo Kojima the creator of the franchise.
The final chapter Metal Gear Solid V: The Phantom Pain was released in 2015 and brings the series to a whole new level of graphics quality thanks to the Fox Engine developed by Kojima Productions. The analysis below is based on the PC version of the game with all the quality knobs set to maximum. Some of the information I present here has already been made public in the GDC 2013 session “Photorealism Through the Eyes of a FOX”.
Dissecting a Frame
Here is a frame taken from the very beginning of the game, during the prologue when Snake tries to make his way out of the hospital. Snake is lying on the floor trying to blend in among the other corpses, he’s at the bottom of the screen with his naked shoulder. Not the most glamorous scene but it illustrates well the different effects the engine can achieve.
Right in front of Snake two soldiers are standing up, they’re looking at some burning silhouette at the end of the hallway. I’ll simply refer to that mysterious individual as the “Man on Fire” not to spoil anything about the story.
So let’s see how this frame is rendered!
This pass renders only the geometry of the terrain underneath the hospital as viewed from the point of view of the player and outputs its depth information to a depth buffer. The terrain mesh is generated from the heightmap you can see below: it’s a 16-bit floating point texture containing the terrain elevation value (view from the top). The engine divides the heightmap into different tiles, for each tile a draw call is dispatched with a flat grid of 16x16 vertices. The vertex shader reads the heightmap and modifies on-the-fly the vertex position to match the elevation value. The terrain is rasterized in about 150 draw calls.
MGS V uses a deferred renderer like many games of its generation, if you already read the GTA V study you will notice several similar elements. So instead of calculating directly the final lighting value of each pixel as the scene is rendered, the engine first stores the properties of each pixels (like albedo colors, normals…) in several render targets called G-Buffer and will later combine all this information together.
All the following buffers are generated at the same time: