Part 5.1: Reversing Construction of the Projection Matrix (X & Y Scales)

Now that we have found the function responsible for constructing the Perspective Projection Matrix Let’s begin reversing it to have a clear understanding on how the game engine constructs this matrix every frame!

Now we wont be reversing the entire function which we have stumbled into because the function we found by “finding out what writes to this address” feature in CE seems to be a very large function responsible for constructing various Matrices such as Camera, View, Projection, Inverse Projection, Identity Scaler, ViewProjection, InvProjCamera and has multiple function calls to Matrix4x4Multiply(), Matrix4x4Inverse() and even Calculate View Frustum function. We will only be looking into Projection and Inverse Projection Construction.

Let’s look at where we were initially:

ESP-Image1

The highlighted block is the First Row of the Projection Matrix which is the instruction responsible for updating our Projection Matrix every frame. let’s zoom out a bit and see what’s really going on.

It seems to be a part of an if-else block where our instruction is being executed inside the else block.

else block:

ESP-Image1

if block:

ESP-Image1

Now we clearly see the big picture happening here, The else block is clearly constructing a Perspective Projection Matrix as per my reasoning in part-4. The if block seems to be constructing an Orthographic Projection Matrix, my reasoning being these lines:

v51.m128_f32[0] = 2.0 / *(float *)(a1 + 0x3C0); // 2/width

and

v55.m128_f32[0] = (float)(1.0 / *(float *)(a1 + 0x3C4)) + (float)(1.0 / *(float *)(a1 + 0x3C4)); // 2/height

The values inside a1 + 0x3C0 and a1 + 0x3C4 have been checked to be Width and Height using CE during runtime.

These are the expected xScale and yScale values for an Orthographic Projection matrix. And look where it is stored:

v54 = _mm_unpacklo_ps(_mm_unpacklo_ps(v51, (__m128)0LL), (__m128)0LL); *(__m128 *)(a1 + 0xF0) = v54; // v54 = [2/width, 0, 0, 0]

and

*(__m128 *)(a1 + 0x100) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps(v55, (__m128)0LL)); // Final Unpack = [0, 2/height, 0, 0]

But what the hell is an _mm_unpacklo_ps??

The basic theory is simple:

the “lo” in “unpacklo” means we are only targeting the lowest 64-bits inside a 128-bit register and “ps” stands for “Packed Single” which tells the cpu to treat the 128-bit register as four 32-bit floats.
This is basically what it does:

Suppose:

Register A: [ A3, A2, A1, A0 ] (where A0 is the lowest float) (arg1)
Register B: [ B3, B2, B1, B0 ] (arg2)

Result:

[ A0, B0, A1, B1 ]

that’s all.

As a refresher, a standard Orthographic Projection Matrix looks like this:

\[\begin{bmatrix} \frac{2}{w} & 0 & 0 & 0 \\ 0 & \frac{2}{h} & 0 & 0 \\ 0 & 0 & \frac{1}{z_{far} - z_{near}} & 0 \\ 0 & 0 & -\frac{z_{near}}{z_{far} - z_{near}} & 1 \end{bmatrix}\]

Thus validating our belief that the if block is constructing an Orthographic Projection Matrix so the if-else logic would be:

if(shouldConstructOrthoProj) { ConstructOrthoProj(); } else { ConstructPerspectiveProj(); }

Now for the hard part, Reversing the complete logic inside the else block…

Reversing Construction of the Perspective-Projection Matrix

Let’s begin Reversing the else block:

ESP-Image1

1 else 2 { 3 v33 = (__m128)*(unsigned int *)(a1 + 0x234); 4 v33.m128_f32[0] = v33.m128_f32[0] * 0.5; 5 ucrtBase_Tanf(); 6 v34 = v33; 7 tanFovXby2_dup = v33.m128_f32[0]; 8 v34.m128_f32[0] = v33.m128_f32[0] / *(float *)(a1 + 0x18); 9 ucrtBase_aTanf(); 10 v35 = (__m128)*(unsigned int *)(a1 + 0x430); 11 v36 = v33; 12 v36.m128_f32[0] = v33.m128_f32[0] * 2.0; 13 if ( v35.m128_f32[0] == 0.0 ) 14 { 15 ucrtBase_aTanf(); 16 v35 = v34; 17 v35.m128_f32[0] = v34.m128_f32[0] * 2.0; 18 } 19 v36.m128_f32[0] = v36.m128_f32[0] * 0.5; 20 ucrtBase_Tanf(); 21 v37 = (__m128)0x3F800000u; 22 v35.m128_f32[0] = v35.m128_f32[0] * 0.5; 23 v38 = v36; 24 v37.m128_f32[0] = 1.0 / v36.m128_f32[0]; 25 ucrtBase_Tanf(); 26 v39 = (__m128)*(unsigned int *)(a1 + 0x428); 27 v40 = (__m128 *)(a1 + 0xF0); 28 v41 = a1 + 0x130; 29 v42 = v35; 30 v43 = (__m128)0x3F800000u; 31 v44 = (__m128)0x3F800000u; 32 v44.m128_f32[0] = 1.0 / v35.m128_f32[0]; 33 v45 = v32; 34 v45.m128_f32[0] = v32.m128_f32[0] - v31.m128_f32[0]; 35 v46 = (__m128)*(unsigned int *)(a1 + 0x42C); 36 *(__m128 *)(a1 + 0x100) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps(v44, (__m128)0LL)); 37 v43.m128_f32[0] = 1.0 / (float)(v32.m128_f32[0] - v31.m128_f32[0]); 38 *(__m128 *)(a1 + 0xF0) = _mm_unpacklo_ps(_mm_unpacklo_ps(v37, (__m128)0LL), (__m128)0LL); 39 v47 = v43; 40 v47.m128_f32[0] = v43.m128_f32[0] * v31.m128_f32[0]; 41 *(__m128 *)(a1 + 0x110) = _mm_unpacklo_ps(_mm_unpacklo_ps(v39, v47), _mm_unpacklo_ps(v46, (__m128)0xBF800000)); 42 v31.m128_f32[0] = v31.m128_f32[0] * v32.m128_f32[0]; 43 v48 = v31; 44 v48.m128_f32[0] = v31.m128_f32[0] * v43.m128_f32[0]; 45 v45.m128_f32[0] = v45.m128_f32[0] / v31.m128_f32[0]; 46 v49 = _mm_unpacklo_ps(_mm_unpacklo_ps((__m128)0LL, v48), (__m128)0LL); 47 v50 = (__m128)0x3F800000u; 48 *(__m128 *)(a1 + 0x120) = v49; 49 v42.m128_f32[0] = v35.m128_f32[0] * *(float *)(a1 + 0x42C); 50 v38.m128_f32[0] = v36.m128_f32[0] * *(float *)(a1 + 0x428); 51 v50.m128_f32[0] = 1.0 / v32.m128_f32[0]; 52 *(__m128 *)(a1 + 0x130) = _mm_unpacklo_ps(_mm_unpacklo_ps(v36, (__m128)0LL), (__m128)0LL); 53 *(__m128 *)(a1 + 0x140) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps(v35, (__m128)0LL)); 54 *(__m128 *)(a1 + 0x150) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps((__m128)0LL, v45)); 55 *(__m128 *)(a1 + 0x160) = _mm_unpacklo_ps(_mm_unpacklo_ps(v38, (__m128)0xBF800000), _mm_unpacklo_ps(v42, v50)); 56 *(float *)(a1 + 0x1C) = 1.0 / fminf(tanFovXby2, v34.m128_f32[0]); 57 }

xScale Construction:

Because the compiler interleaved the instructions for optimization, the calculation for the X and Y scales are tangled together. Let’s isolate just the xScale logic:

ESP-Image1

Lines highlighted in red are responsible for xScale calculation.

Let’s start with this block

3 v33 = (__m128)*(unsigned int *)(a1 + 0x234); 4 v33.m128_f32[0] = v33.m128_f32[0] * 0.5; 5 ucrtBase_Tanf();

Using CE for dynamic analysis we can see that (a1 + 0x234) is FovX in radians or “1.8326” matching our Fov slider we set to “105 degrees” before.

It loads it as an (__m128)*(unsigned int *) which treats it as an integer but we know it is a float, it doesn’t matter what type it is as long as it’s 4 bytes. At the CPU level, bits are just bits…
So basically load a FovX in radians into “v33” then immediately divide it by 2 so now “v33” holds the value fovX/2.

Next we see a call ucrtBase_Tanf(); which IDA has failed to assign arguments to. But no worries we will look at the assembly for it’s arguments.

Following Windows ABI convention xmm0 will have the arg and result will be stored inside xmm0 as well. Tracing the assembly we see:

ESP-Image1

[rbx+234h] is our fovX which gets stored into xmm0, gets multiplied by 0.5 and used as an arg for tanf call. After the call the lowest 32-bits of xmm0 register will hold tan(fovX/2).

After this block there are lines which use FovX which i have not highlighted. This is because its either trying to derive FovY with FovX or is saving the current value of v33 for later use.

Next block is:

9 ucrtBase_aTanf(); 11 v36 = v33; 12 v36.m128_f32[0] = v33.m128_f32[0] * 2.0;

Moving onto the next line we see redundancy or a quirk, it is an ucrtBase_aTanf(); call and looking at the assembly the argument is v33 again so now “v33 = atanf(tanf(fovX/2))” which will equal “fovX/2”.

Why?

FovX can only have a value from 60 to 120 degrees in-game. Since we divided it by 2, the angle is between 30 and 60 degrees (well within the -π/2 to π/2 principal bounds of arctan), meaning arctan(tan(x)) = x. Calling a atanf function just to get back fovX/2 is a waste of CPU cycles but still negligible.

The identity arctan(tan(x)) = x holds if and only if x lies strictly inside (−90°, 90°), the identity holds without exception.

then:

v36 is initialized with the value of v33 (fovX / 2) and immediately multiplied by 2.0, bringing it back to the original fovX

Next block is:

19 v36.m128_f32[0] = v36.m128_f32[0] * 0.5; 20 ucrtBase_Tanf(); 23 v38 = v36; 24 v37.m128_f32[0] = 1.0 / v36.m128_f32[0];

v36 was fovX, now its fovX/2 after “v36 * 0.5”.

Then it calls a ucrtBase_Tanf() with arg as v36 so the result in v36 is tan(fovX/2).

Next it saves tan(fovX/2) into v38 for later calculations and finally does “v37.m128_f32[0] = 1.0 / v36.m128_f32[0];” Completing our calculation for xScale and saving it inside v37.

\[x_{scale} = \frac{1}{\tan{\left(\frac{Fov_X}{2}\right)}}\]

yScale Construction:

ESP-Image1

Let’s start with this block:

6 v34 = v33; 8 v34.m128_f32[0] = v33.m128_f32[0] / *(float *)(a1 + 0x18); 10 v35 = (__m128)*(unsigned int *)(a1 + 0x430);

v34 is first assigned the value of tan(fovX/2) (as an m128 so only lowest 32 bits are fov values) then later is divided by a value at *(float *)(a1 + 0x18). With dynamic analysis we can see it is a constant of “1.777” which is our aspect ratio of 16:9 but the interesting part is that it’s constant and won’t change even when aspect ratio is 4:3 or 16:10.

So now v34 holds the value “tan(fovX/2) / Aspect Ratio”, Hmm this formula looks familiar…

\[\tan\left(\frac{FOV_X}{2}\right) \,/\, A = \tan\left(\frac{FOV_Y}{2}\right)\]

In the next line a value from (__m128)*(unsigned int *)(a1 + 0x430) is loaded into v35. With dynamic analysis we can see most of the time this is zero.

Next block:

13 if ( v35.m128_f32[0] == 0.0 ) 14 { 15 ucrtBase_aTanf(); 16 v35 = v34; 17 v35.m128_f32[0] = v34.m128_f32[0] * 2.0; 18 }

if (__m128)*(unsigned int *)(a1 + 0x430) / v35 is zero then it will calculate the value of FovY using FovX with this formula:

\[FOV_Y = 2 \cdot \tan^{-1}\!\left(\frac{\tan\!\left(\frac{FOV_X}{2}\right)}{A}\right)\]

and save it inside “v35”

v34 previously held the value of “tan(fovX/2) / Aspect Ratio” so now after the ucrtBase_aTanf() call with its arg being “v34”, “v34” will have the value atan(tan(fovX/2) / AspectRatio)). This gets saved into v35 and immediately after multiples v34 with 2 and saves it inside v35. So now v35 has the value:

“2 * atan(tan(fovX/2) / AspectRatio))” matching our formula exactly!

Next block:

22 v35.m128_f32[0] = v35.m128_f32[0] * 0.5; 25 ucrtBase_Tanf(); 29 v42 = v35; 32 v44.m128_f32[0] = 1.0 / v35.m128_f32[0];

Next it will calculate FovY/2 and save it into “v35” then save the value of “v35” into “v42” for later calculations.

Then it finally does a call to ucrtBase_aTanf() with “v35” as arg so the value inside “v35” is tan(fovY/2), Next it will complete the calculation for yScale by doing:

v44.m128_f32[0] = 1.0 / v35.m128_f32[0]; so final value inside “v44” is “1/tan(fovY/2)”

So now v44 = 1/tan(fovY/2) and v37 = 1/tan(fovX/2) (Only the lowest 32-bits are used, rest are 0’s).

Putting It All Together: The Cleaned Code

Now that we understand the math behind both the X and Y scales, we can go back into IDA, rename our variables, and comment the IDA pseudo code.

1 else 2 { // --- Initial Setup & FovX --- // Load FovX (radians) from a1 + 0x234 3 fovX_calc = (__m128)*(unsigned int *)(a1 + 0x234); 4 fovX_calc.m128_f32[0] = fovX_calc.m128_f32[0] * 0.5; // Arg: fovX_calc. Result: fovX_calc = tan(FovX / 2) 5 ucrtBase_Tanf(); 6 fovY_calc = fovX_calc; 7 tanFovXby2_dup = fovX_calc.m128_f32[0]; // Divide by Aspect Ratio at a1 + 0x18 (hardcoded at 1.777) // fovY_calc = tan(FovX / 2) / AspectRatio 8 fovY_calc.m128_f32[0] = fovX_calc.m128_f32[0] / *(float *)(a1 + 0x18); // --- The Redundant Call --- // Arg: fovX_calc. Result: atan(tan(FovX / 2)). Reverts back to FovX / 2 9 ucrtBase_aTanf(); // --- FovY Calculation --- // Check if FovY is pre-calculated (usually 0) 10 fovY_val = (__m128)*(unsigned int *)(a1 + 0x430); 11 fovX_val = fovX_calc; // fovX_val = (FovX / 2) * 2.0 -> FovX 12 fovX_val.m128_f32[0] = fovX_calc.m128_f32[0] * 2.0; 13 if ( fovY_val.m128_f32[0] == 0.0 ) 14 { // Arg: fovY_calc. Result: atan(tan(FovX / 2) / AspectRatio) 15 ucrtBase_aTanf(); 16 fovY_val = fovY_calc; // fovY_val = 2 * atan(tan(FovX / 2) / AspectRatio) 17 fovY_val.m128_f32[0] = fovY_calc.m128_f32[0] * 2.0; 18 } // fovX_val = FovX / 2 19 fovX_val.m128_f32[0] = fovX_val.m128_f32[0] * 0.5; // Arg: fovX_val. Result: fovX_val = tan(FovX / 2) 20 ucrtBase_Tanf(); 21 xScale = (__m128)0x3F800000u; // 1.0f // fovY_val = FovY / 2 22 fovY_val.m128_f32[0] = fovY_val.m128_f32[0] * 0.5; // Save tan(FovX / 2) for later calculations 23 tan_fovX_half_saved = fovX_val; // --- Final xScale Calculation --- // xScale = 1.0 / tan(FovX / 2) 24 xScale.m128_f32[0] = 1.0 / fovX_val.m128_f32[0]; // Arg: fovY_val. Result: fovY_val = tan(FovY / 2) 25 ucrtBase_Tanf(); 26 v39 = (__m128)*(unsigned int *)(a1 + 0x428); 27 v40 = (__m128 *)(a1 + 0xF0); 28 v41 = a1 + 0x130; // Save tan(FovY / 2) for later calculations 29 tan_fovY_half_saved = fovY_val; 30 v43 = (__m128)0x3F800000u; // 1.0f 31 yScale = (__m128)0x3F800000u; // 1.0f // --- Final yScale Calculation --- // yScale = 1.0 / tan(FovY / 2) 32 yScale.m128_f32[0] = 1.0 / fovY_val.m128_f32[0];

Since this part is getting too long the reversal for depth mapping calculations will be done on the next part!

In Part 5.2, we will look at how the engine uses zNear and zFar for depth mapping, and how it uses SIMD instructions like _mm_unpacklo_ps to pack all of these isolated variables into the final 4x4 projection matrix in memory.