Math at Scale: Reversing the AAA Projection Matrix

Part 1: Introduction

2026-04-03T18:30:00+00:00

Part 1: Introduction

2026-04-03T18:30:00+00:00

⚠️ Disclaimer:

The content on this blog is provided for educational and informational purposes only. I do not encourage, endorse, or support any illegal activity. Any techniques, tools, or concepts discussed are intended solely for learning, research, defensive security, or experimentation in safe, legal environments. Use this information responsibly and at your own risk. The author is not responsible for any misuse or legal consequences that may result from applying what’s written here.

Prerequisites:

Theoretical Knowledge:

Linear Algebra:
- Comfort with matrix-vector multiplication, dot products, and coordinate spaces
  (Model → World → View → Projection)
Trigonometry
x86-64 Assembly & SIMD:
- Ability to read assembly with a basic understanding of SSE/AVX instruction sets

Technical Skills:

Static Analysis: Experience navigating large binaries in IDA Pro or Ghidra.
Dynamic Analysis & Memory Research: Cheat Engine for finding addresses and tracing functions.
Importantly: Comfort reasoning through ambiguity.

Tools Used:

Disassembler: IDA Pro (Ghidra/Binary Ninja are fine alternatives).
Memory Scanner: Cheat Engine (For finding the “live” variables like FOV or Aspect Ratio).

I will not be covering the “how-to” for basic tool usage.

What’s the point of doing this?

Mainly for:

1. Modding & Engine Fixes:

If you can understand how its built, it opens up the door for modifying its construction to do basically whatever we desire, like:

Aspect Ratio Correction / Ultra Wide support:
- if a game doesn’t natively support ultra wide or hardcodes their aspect ratio we can simply modify the values (related to 1/tan(fov/2) or Aspect Ratio multiplier/divisor (e.g., changing 1.777 to 2.333) that gets applied to either the X or Y scale). Also corrects games where they use “Vert-“ scaling (where the top/bottom are cut off) and force “Hor+” scaling.
Frustum Manipulation:
- Extending the Far plane, modifying the Near plane, maybe even switch from standard depth to Reversed-Z buffer (shader logic must also be changed for this)
Wider FOV:
- Bypassing hardcoded FOV sliders and go beyond the limits imposed by developers
Temporal Anti-Aliasing (TAA) & DLSS Jitter injection:
- I’ve been working with devs from Luma who add DLSS to games which never had TAA / had only a few jitters. Modern upscalers need sub-pixel offsets to function properly. If you reverse the construction, you can inject a Halton Sequence or other jitter distributions directly into the projection values to ensure the engine renders slightly different pixel positions every frame.

2. Performance Optimization:

Reversing the construction of the Projection matrix (or other paths in the rendering pipeline) gives you instruction-level insight into how the engine handles its most critical per-frame math and can help tell you if the engine was wasting cycles or if you can write a more efficient inline assembly, though most engines are pretty robust and won’t have fundamental flaws that tank CPU efficiency you can never be too sure. The Dunia Engine used for Far Cry New Dawn seems to have various optimization issues, but that’s a blog post for another day.

Write-Up Outline:

Snippets of what we will be working with:

IDA C pseudo code:

SIMD Assembly:

Cheat Engine:

The game engine that will be used as an example is Far Cry New Dawn’s Dunia Engine.

These Reverse engineering insights from this engine can be applied to others as well

Part 2: What is the Projection Matrix?

2026-04-03T18:30:00+00:00

I previously touched on this here: Reversing The ViewProjection Matrix (Part 2.2), but we need to establish a baseline before we start tearing apart engine binaries.

I won’t be giving a deep-dive computer graphics lecture here (Frustum plane derivations, complex perspective math, etc.). Instead, we will learn how textbooks do the math vs. how real Game Engines do the math.

The Bare Minimum: What is it?

If the View Matrix acts as the camera’s position and rotation in the world, the Projection Matrix is the camera’s lens. It compresses the 3D world flat onto your 2D monitor.

It does this by defining a View Frustum (a 3D truncated pyramid of space) using four variables:

Field of View (FOV): How wide the lens is.
Aspect Ratio: Your monitor’s dimensions (16:9, 21:9).
Near & Far Planes: The minimum and maximum depth the camera can see.

Anything inside this pyramid gets rendered to your screen. Anything outside gets clipped. That is all the theory you need for now.

The Memory Layout (What We Actually Care About)

As reverse engineers, understanding the theory is secondary to understanding the memory layout. When you are scanning memory in Cheat Engine, a projection matrix just looks like a contiguous block of 16 floating-point numbers.

To spot it in the sea of data, you have to know what those 16 floats represent. Here is the standard perspective projection matrix layout (DirectX):

\[P = \begin{bmatrix} x_{scale} & 0 & 0 & 0 \\ 0 & y_{scale} & 0 & 0 \\ 0 & 0 & \dfrac{z_{far}}{z_{far} - z_{near}} & 1 \\ 0 & 0 & -\dfrac{z_{near} \cdot z_{far}}{z_{far} - z_{near}} & 0 \end{bmatrix}\]

The formulas used for the zNear and zFar depth mapping (Rows 2 and 3) are not universal and can differ engine to engine, convention to convention. What actually matters is that after projection and perspective divide (in DirectX convention), the near plane maps to 0 and the far plane maps to 1 (or the opposite in Reversed-Z.)

However, the X and Y scales (the floats at [0][0] and [1][1]) are almost always derived directly from the Field of View using tangent math:

\[x_{scale} = \frac{1}{\tan{\left(\frac{Fov_X}{2}\right)}}\] \[y_{scale} = \frac{1}{\tan{\left(\frac{Fov_Y}{2}\right)}}\]

This is our golden ticket. If we know the game’s FOV, we can calculate the exact floating-point value for xScale, and use Cheat Engine to hunt down that exact float in memory.

A Projection Matrix standing out in a sea of floats.

Let’s look at the exact methods we use to hunt this matrix down in Part 3.

Part 3: Methods to Find The Projection Matrix

2026-04-03T18:30:00+00:00

I will be going over methods that I use to find the projection matrix. Keep in mind these are just the methods I personally use, easier methods probably exist.

Our work flow will go like so:
Find Projection Matrix -> Trace Construction -> Reverse Construction
10 times harder than it seems…

Methods to find Projection Matrix

1. The 1,-1 / Up-Down Trick:

First we look max up in game and search for the value “1” in cheat engine, then look down and search for the value “-1”. Repeat this process until you narrow down the results.
When you finish narrowing it down you will get the View or Camera Matrix. Now the trick here is that all these matrices are probably in some kind of camera structure and they store multiple matrices close in memory and there is a high chance one of those matrices is a projection matrix.

If you do not find a Projection Matrix but instead a View-Projection Matrix and a View Matrix then you can simply derive the Projection Matrix using Matrix Algebra:

If the game uses the row-vector convention (common in DirectX), where the multiplication order is:
$VP = V \times P$
You can solve for P by left-multiplying both sides by the inverse of the View matrix ($V^{-1}$):
$P = V^{-1} \times VP$
If the game uses the column-vector convention (common in OpenGL), where the multiplication order is:
$VP = P \times V$
Then, you can solve for P by right-multiplying both sides by the inverse of the View matrix ($V^{-1}$):
$P = VP \times V^{-1}$

Then simply search for the Projection Values!

2. Focal Length Scanning:

For this method to work the game must expose its FOV. This will usually be exposed in settings using degrees (Usually FovX is exposed). For this to work you need to understand that the first term (xScale) of the projection matrix is 1/tan(fovX/2) so in Cheat engine we are searching for the result of 1/tan(fovX/2). First step is converting the degrees seen on the FOV slider to radians and simply compute the result of 1/tan(fovX/2). Way easier to narrow down results than method 1.

3. Brute Force:

This is the most time-consuming method. If the game does not give any hints about its FOV (Fov Slider is not available or Values in the slider are not real degrees etc) then we can use this method without being a full-blown reverse engineer:

If Fov slider is not exposed:
Before we get into the method let me teach some theory. This is the relation between xScale and yScale: $\tan\left(\frac{\mathrm{FOV}_x}{2}\right) = AR \cdot \tan\left(\frac{\mathrm{FOV}_y}{2}\right)$
Now we can’t change FOV but we can change Aspect Ratio! (Hopefully)
The trick is in the “Increased Value” and “Decreased Value” scan methods of Cheat engine. If you increase the Aspect Ratio (e.g., moving from 16:9 to 21:9) the engine must change the xScale value in the Projection Matrix to prevent the image from stretching. So: Increasing the aspect ratio causes xScale to decrease and vice-versa.

Engine might be Vert- scaling instead of Hor+ Scaling, just keep in mind

If Fov slider has no real values:
Well its the same principle, “Increased Value” and “Decreased Value” scan methods of Cheat engine is your friend!
Increasing FOVx → xScale decreases and vice versa

4. Tracing Flow:

This is kinda difficult for most people to do, you need to have a keen eye for recognizing 4x4 Matrix Multiplies and Projection Matrix Construction code in IDA.

The key idea is that you did method 1 and couldn’t find the projection matrix anywhere close in memory, So you trace the flow of matrices. Got Camera or View Matrix? Great!
Use the “What instruction accesses this memory address” function in cheat engine and it will show you the exact function which uses said matrices. Then simply decompile the functions in IDA. Now there are 3 different things I usually look for when decompiling these functions:

a) Large Matrix Construction Function

You would see a very large Matrix Construction function responsible for constructing many different matrices per frame eg Proj, Cam, View, InvProj, InvView, VP, VP^-1 etc… Now you need a keen eye to isolate just the Projection Construction, ask yourself is it using values relating to an aspect ratio? 1.7777, 1.3333? is it calling tanf/atanf functions? is it then doing 1/the tanf call? are zNear and zFar its parameters? etc…

Keep these derivations in mind as well :> (very helpful while trying to pin point it)

Let A denote the Aspect Ratio, defined as

\[A = \text{AspectRatio} = \frac{\text{Width}}{\text{Height}}\]

The relationship between $FOV_x$ and $FOV_y$ is:

\[\tan\left(\frac{FOV_X}{2}\right) = A \cdot \tan\left(\frac{FOV_Y}{2}\right)\]

From this, we can express each one in terms of the other:

\[FOV_X = 2 \cdot \tan^{-1}\!\left(A \cdot \tan\!\left(\frac{FOV_Y}{2}\right)\right)\] \[FOV_Y = 2 \cdot \tan^{-1}\!\left(\frac{\tan\!\left(\frac{FOV_X}{2}\right)}{A}\right)\]

Example Image:

b) 4x4 Matrix Multiply:

If you see Heavy SIMD usage specifically addps, shufps, mulps where shufps uses these constants to broadcast a single variable (0x00, 0x55, 0xAA, 0xFF) you can be pretty sure its doing a Matrix 4x4 Multiply, Now just look at the parameters and you would likely see its doing View * Projection (or Projection * View depending on row vs column major) and boom, it led us to the projection matrix!

Example Image:

c) Large Matrix copy function

If you landed in a Large Matrix Copy Function, you probably tracked down the final GPU-bound floats. A routine that is just copying the finished matrices into a staging buffer to send to the GPU or something else.

You will have to trace further back, Look at the Stack, call locations, trace back etc…

Example Image:

5. Pattern Recognition (Jitter Tables):

To understand this concept you need to understand what Temporal Anti-Aliasing is:
The basic principle on why this method works is that TAA relies on different subpixel values for anti aliasing, different subpixel values are achieved by jittering the projection matrix every frame. If we know the exact pattern in the Jitter Table we could scan that exact same pattern as a “Grouped Scan” in Cheat engine. Once you find the jitter table use the “Find out what instruction access this memory address” and it will lead you to the code that jitters the projection matrix leading us to the projection matrix. You will likely see them accessing the jitter table and scaling it down with current Resolution X and Y and then it multiplying the projection Matrix with an identity Matrix in which the subpixel jitter values were placed inside it.

eg: In almost all SMAA_T2X games the pattern is {0.25, -0.25} and {-0.25, 0.25} (Watch Dogs2, Just cause 3, Dying Light all had this exact pattern for SMAA_T2X).

So in Cheat Engine select “Value Type” as “Grouped” and put “ f:0.25 f:-0.25 f:-0.25 f:0.25 “ in the Scan Box and you should get the Jitter array (Usually a static memory address)

same method used in Watch Dogs 2

6. The Graphics Debugger Route (RenderDoc to Memory):

I will be honest, this is a method I haven’t personally had to rely on yet, but the theory is rock solid!

Instead of hunting blindly in memory, you start at the GPU level using a graphics debugger like RenderDoc. The workflow looks something like this:

a) Capture a frame of the game in RenderDoc.
b) Look through the Event Browser and select a substantial geometry draw call, usually something like DrawIndexed() with a high index count.
c) Inspect the Constant Buffers (CBuffer) bound to the Vertex Shader for that draw call.
d) The vertex shader mathematically requires the View and Projection matrices to transform local geometry into clip space. By looking through the bound buffers, you can physically see the 4x4 matrix floats sitting right there in the GPU state.
e) Once you spot the Projection Matrix, note it down.

Armed with those exact highly specific floats, you switch back to Cheat Engine and do a “Grouped” scan. Because you are searching for 4 or 5 exact float values in a specific order, you will usually find the CPU-side memory address instantly.

Once you have that address you can do the “Find out what instruction accesses this memory address” in Cheat Engine. From there, you take the instruction pointer straight into IDA Pro.

You will probably reach the Projection Construction function.

Sometimes the constant buffers wont even have the Projection Matrix lol. Would Probably be Multiplied with other matrices.

Next we will be using these methods to find the projection matrix in far cry new dawn.

Part 4: Finding and Tracing Projection Construction

2026-04-03T18:30:00+00:00

Let’s try and find the Projection Matrix in Memory:

In Far cry New dawn the Fov Slider exposes the actual degrees used by the projection Matrix thus making it very easy to find it.

Reminder that the xScale is:

\[\frac{1}{\tan\left(\frac{FOV_X}{2}\right)}\]

Now we have access to FovX. I will set the in-game slider to 105 degrees. Next, we convert it to radians and calculate the result:

\[\frac{1}{\tan{\left(\frac{1.8326}{2}\right)}} = 0.7673\]

And then we simply search for this value:

Now I change the Fov Slider value to “60 degrees”, put this through the formula and we get “1.732” and search for it again. Do it a few more times to narrow it down and we get:

Now verify all matrices manually till we find a Projection Matrix that is close to the textbook layout.

This seems to be a Projection Matrix that comes close but still not “Textbook layout”:

As I’ve said before Game engines need not follow convention

In the first Matrix we can see the expected values in [0][0] (xScale) and [1][1] (yScale) as well as Depth Mapping in [2][2] and [3][2], the unexpected value is 0.13 at [2][1] (which is supposed to be 0 in a standard layout). We will figure out what this anomaly is later. The second Matrix is simply the Inverse Projection.

Let’s try to find out “what writes to this address” using CE

We can see two instructions. Both seem to be from the same function, let’s decompile the function in IDA pro:

Far Cry New Dawn uses Denuvo Anti-Tamper, IDA pro will endlessly loop and will never finish decompiling fully due to the obfuscated nature of the binary. Make sure to turn off auto-analysis and do manual analysis!

Looks like a mess but to a trained eye we can tell it is doing a Projection Matrix Construction!

Why?

Look at these 1/x’s and x/2’s and various calls to Tanf and aTanf. A clear indicator for Perspective Projection Matrix Construction

Visual Verification

Before reversing the function, we need to test if this is the real projection matrix used for rendering. I will be changing the FovX value accessed by the function

Next we will be Reversing this function using IDA pro.

Part 5.1: Reversing Construction of the Projection Matrix (X & Y Scales)

2026-04-03T18:30:00+00:00

Now that we have found the function responsible for constructing the Perspective Projection Matrix Let’s begin reversing it to have a clear understanding on how the game engine constructs this matrix every frame!

Now we wont be reversing the entire function which we have stumbled into because the function we found by “finding out what writes to this address” feature in CE seems to be a very large function responsible for constructing various Matrices such as Camera, View, Projection, Inverse Projection, Identity Scaler, ViewProjection, InvProjCamera and has multiple function calls to Matrix4x4Multiply(), Matrix4x4Inverse() and even Calculate View Frustum function. We will only be looking into Projection and Inverse Projection Construction.

Let’s look at where we were initially:

The highlighted block is the First Row of the Projection Matrix which is the instruction responsible for updating our Projection Matrix every frame. let’s zoom out a bit and see what’s really going on.

It seems to be a part of an if-else block where our instruction is being executed inside the else block.

else block:

if block:

Now we clearly see the big picture happening here, The else block is clearly constructing a Perspective Projection Matrix as per my reasoning in part-4. The if block seems to be constructing an Orthographic Projection Matrix, my reasoning being these lines:

v51.m128_f32[0] = 2.0 / *(float *)(a1 + 0x3C0); // 2/width

and

v55.m128_f32[0] = (float)(1.0 / *(float *)(a1 + 0x3C4)) + (float)(1.0 / *(float *)(a1 + 0x3C4)); // 2/height

The values inside a1 + 0x3C0 and a1 + 0x3C4 have been checked to be Width and Height using CE during runtime.

These are the expected xScale and yScale values for an Orthographic Projection matrix. And look where it is stored:

v54 = _mm_unpacklo_ps(_mm_unpacklo_ps(v51, (__m128)0LL), (__m128)0LL); *(__m128 *)(a1 + 0xF0) = v54; // v54 = [2/width, 0, 0, 0]

and

*(__m128 *)(a1 + 0x100) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps(v55, (__m128)0LL)); // Final Unpack = [0, 2/height, 0, 0]

But what the hell is an _mm_unpacklo_ps??

The basic theory is simple:

the “lo” in “unpacklo” means we are only targeting the lowest 64-bits inside a 128-bit register and “ps” stands for “Packed Single” which tells the cpu to treat the 128-bit register as four 32-bit floats.
This is basically what it does:

Suppose:

Register A: [ A3, A2, A1, A0 ] (where A0 is the lowest float) (arg1)
Register B: [ B3, B2, B1, B0 ] (arg2)

Result:

[ A0, B0, A1, B1 ]

that’s all.

As a refresher, a standard Orthographic Projection Matrix looks like this:

\[\begin{bmatrix} \frac{2}{w} & 0 & 0 & 0 \\ 0 & \frac{2}{h} & 0 & 0 \\ 0 & 0 & \frac{1}{z_{far} - z_{near}} & 0 \\ 0 & 0 & -\frac{z_{near}}{z_{far} - z_{near}} & 1 \end{bmatrix}\]

Thus validating our belief that the if block is constructing an Orthographic Projection Matrix so the if-else logic would be:

if(shouldConstructOrthoProj) { ConstructOrthoProj(); } else { ConstructPerspectiveProj(); }

Now for the hard part, Reversing the complete logic inside the else block…

Reversing Construction of the Perspective-Projection Matrix

Let’s begin Reversing the else block:

1 else 2 { 3 v33 = (__m128)*(unsigned int *)(a1 + 0x234); 4 v33.m128_f32[0] = v33.m128_f32[0] * 0.5; 5 ucrtBase_Tanf(); 6 v34 = v33; 7 tanFovXby2_dup = v33.m128_f32[0]; 8 v34.m128_f32[0] = v33.m128_f32[0] / *(float *)(a1 + 0x18); 9 ucrtBase_aTanf(); 10 v35 = (__m128)*(unsigned int *)(a1 + 0x430); 11 v36 = v33; 12 v36.m128_f32[0] = v33.m128_f32[0] * 2.0; 13 if ( v35.m128_f32[0] == 0.0 ) 14 { 15 ucrtBase_aTanf(); 16 v35 = v34; 17 v35.m128_f32[0] = v34.m128_f32[0] * 2.0; 18 } 19 v36.m128_f32[0] = v36.m128_f32[0] * 0.5; 20 ucrtBase_Tanf(); 21 v37 = (__m128)0x3F800000u; 22 v35.m128_f32[0] = v35.m128_f32[0] * 0.5; 23 v38 = v36; 24 v37.m128_f32[0] = 1.0 / v36.m128_f32[0]; 25 ucrtBase_Tanf(); 26 v39 = (__m128)*(unsigned int *)(a1 + 0x428); 27 v40 = (__m128 *)(a1 + 0xF0); 28 v41 = a1 + 0x130; 29 v42 = v35; 30 v43 = (__m128)0x3F800000u; 31 v44 = (__m128)0x3F800000u; 32 v44.m128_f32[0] = 1.0 / v35.m128_f32[0]; 33 v45 = v32; 34 v45.m128_f32[0] = v32.m128_f32[0] - v31.m128_f32[0]; 35 v46 = (__m128)*(unsigned int *)(a1 + 0x42C); 36 *(__m128 *)(a1 + 0x100) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps(v44, (__m128)0LL)); 37 v43.m128_f32[0] = 1.0 / (float)(v32.m128_f32[0] - v31.m128_f32[0]); 38 *(__m128 *)(a1 + 0xF0) = _mm_unpacklo_ps(_mm_unpacklo_ps(v37, (__m128)0LL), (__m128)0LL); 39 v47 = v43; 40 v47.m128_f32[0] = v43.m128_f32[0] * v31.m128_f32[0]; 41 *(__m128 *)(a1 + 0x110) = _mm_unpacklo_ps(_mm_unpacklo_ps(v39, v47), _mm_unpacklo_ps(v46, (__m128)0xBF800000)); 42 v31.m128_f32[0] = v31.m128_f32[0] * v32.m128_f32[0]; 43 v48 = v31; 44 v48.m128_f32[0] = v31.m128_f32[0] * v43.m128_f32[0]; 45 v45.m128_f32[0] = v45.m128_f32[0] / v31.m128_f32[0]; 46 v49 = _mm_unpacklo_ps(_mm_unpacklo_ps((__m128)0LL, v48), (__m128)0LL); 47 v50 = (__m128)0x3F800000u; 48 *(__m128 *)(a1 + 0x120) = v49; 49 v42.m128_f32[0] = v35.m128_f32[0] * *(float *)(a1 + 0x42C); 50 v38.m128_f32[0] = v36.m128_f32[0] * *(float *)(a1 + 0x428); 51 v50.m128_f32[0] = 1.0 / v32.m128_f32[0]; 52 *(__m128 *)(a1 + 0x130) = _mm_unpacklo_ps(_mm_unpacklo_ps(v36, (__m128)0LL), (__m128)0LL); 53 *(__m128 *)(a1 + 0x140) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps(v35, (__m128)0LL)); 54 *(__m128 *)(a1 + 0x150) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps((__m128)0LL, v45)); 55 *(__m128 *)(a1 + 0x160) = _mm_unpacklo_ps(_mm_unpacklo_ps(v38, (__m128)0xBF800000), _mm_unpacklo_ps(v42, v50)); 56 *(float *)(a1 + 0x1C) = 1.0 / fminf(tanFovXby2, v34.m128_f32[0]); 57 }

xScale Construction:

Because the compiler interleaved the instructions for optimization, the calculation for the X and Y scales are tangled together. Let’s isolate just the xScale logic:

Lines highlighted in red are responsible for xScale calculation.

Let’s start with this block

3 v33 = (__m128)*(unsigned int *)(a1 + 0x234); 4 v33.m128_f32[0] = v33.m128_f32[0] * 0.5; 5 ucrtBase_Tanf();

Using CE for dynamic analysis we can see that (a1 + 0x234) is FovX in radians or “1.8326” matching our Fov slider we set to “105 degrees” before.

It loads it as an (__m128)*(unsigned int *) which treats it as an integer but we know it is a float, it doesn’t matter what type it is as long as it’s 4 bytes. At the CPU level, bits are just bits…
So basically load a FovX in radians into “v33” then immediately divide it by 2 so now “v33” holds the value fovX/2.

Next we see a call ucrtBase_Tanf(); which IDA has failed to assign arguments to. But no worries we will look at the assembly for it’s arguments.

Following Windows ABI convention xmm0 will have the arg and result will be stored inside xmm0 as well. Tracing the assembly we see:

[rbx+234h] is our fovX which gets stored into xmm0, gets multiplied by 0.5 and used as an arg for tanf call. After the call the lowest 32-bits of xmm0 register will hold tan(fovX/2).

After this block there are lines which use FovX which i have not highlighted. This is because its either trying to derive FovY with FovX or is saving the current value of v33 for later use.

Next block is:

9 ucrtBase_aTanf(); 11 v36 = v33; 12 v36.m128_f32[0] = v33.m128_f32[0] * 2.0;

Moving onto the next line we see redundancy or a quirk, it is an ucrtBase_aTanf(); call and looking at the assembly the argument is v33 again so now “v33 = atanf(tanf(fovX/2))” which will equal “fovX/2”.

Why?

FovX can only have a value from 60 to 120 degrees in-game. Since we divided it by 2, the angle is between 30 and 60 degrees (well within the -π/2 to π/2 principal bounds of arctan), meaning arctan(tan(x)) = x. Calling a atanf function just to get back fovX/2 is a waste of CPU cycles but still negligible.

The identity arctan(tan(x)) = x holds if and only if x lies strictly inside (−90°, 90°), the identity holds without exception.

then:

v36 is initialized with the value of v33 (fovX / 2) and immediately multiplied by 2.0, bringing it back to the original fovX

Next block is:

19 v36.m128_f32[0] = v36.m128_f32[0] * 0.5; 20 ucrtBase_Tanf(); 23 v38 = v36; 24 v37.m128_f32[0] = 1.0 / v36.m128_f32[0];

v36 was fovX, now its fovX/2 after “v36 * 0.5”.

Then it calls a ucrtBase_Tanf() with arg as v36 so the result in v36 is tan(fovX/2).

Next it saves tan(fovX/2) into v38 for later calculations and finally does “v37.m128_f32[0] = 1.0 / v36.m128_f32[0];” Completing our calculation for xScale and saving it inside v37.

\[x_{scale} = \frac{1}{\tan{\left(\frac{Fov_X}{2}\right)}}\]

yScale Construction:

Let’s start with this block:

6 v34 = v33; 8 v34.m128_f32[0] = v33.m128_f32[0] / *(float *)(a1 + 0x18); 10 v35 = (__m128)*(unsigned int *)(a1 + 0x430);

v34 is first assigned the value of tan(fovX/2) (as an m128 so only lowest 32 bits are fov values) then later is divided by a value at *(float *)(a1 + 0x18). With dynamic analysis we can see it is a constant of “1.777” which is our aspect ratio of 16:9 but the interesting part is that it’s constant and won’t change even when aspect ratio is 4:3 or 16:10.

So now v34 holds the value “tan(fovX/2) / Aspect Ratio”, Hmm this formula looks familiar…

\[\tan\left(\frac{FOV_X}{2}\right) \,/\, A = \tan\left(\frac{FOV_Y}{2}\right)\]

In the next line a value from (__m128)*(unsigned int *)(a1 + 0x430) is loaded into v35. With dynamic analysis we can see most of the time this is zero.

Next block:

13 if ( v35.m128_f32[0] == 0.0 ) 14 { 15 ucrtBase_aTanf(); 16 v35 = v34; 17 v35.m128_f32[0] = v34.m128_f32[0] * 2.0; 18 }

if (__m128)*(unsigned int *)(a1 + 0x430) / v35 is zero then it will calculate the value of FovY using FovX with this formula:

\[FOV_Y = 2 \cdot \tan^{-1}\!\left(\frac{\tan\!\left(\frac{FOV_X}{2}\right)}{A}\right)\]

and save it inside “v35”

v34 previously held the value of “tan(fovX/2) / Aspect Ratio” so now after the ucrtBase_aTanf() call with its arg being “v34”, “v34” will have the value atan(tan(fovX/2) / AspectRatio)). This gets saved into v35 and immediately after multiples v34 with 2 and saves it inside v35. So now v35 has the value:

“2 * atan(tan(fovX/2) / AspectRatio))” matching our formula exactly!

Next block:

22 v35.m128_f32[0] = v35.m128_f32[0] * 0.5; 25 ucrtBase_Tanf(); 29 v42 = v35; 32 v44.m128_f32[0] = 1.0 / v35.m128_f32[0];

Next it will calculate FovY/2 and save it into “v35” then save the value of “v35” into “v42” for later calculations.

Then it finally does a call to ucrtBase_aTanf() with “v35” as arg so the value inside “v35” is tan(fovY/2), Next it will complete the calculation for yScale by doing:

v44.m128_f32[0] = 1.0 / v35.m128_f32[0]; so final value inside “v44” is “1/tan(fovY/2)”

So now v44 = 1/tan(fovY/2) and v37 = 1/tan(fovX/2) (Only the lowest 32-bits are used, rest are 0’s).

Putting It All Together: The Cleaned Code

Now that we understand the math behind both the X and Y scales, we can go back into IDA, rename our variables, and comment the IDA pseudo code.

1 else 2 { // --- Initial Setup & FovX --- // Load FovX (radians) from a1 + 0x234 3 fovX_calc = (__m128)*(unsigned int *)(a1 + 0x234); 4 fovX_calc.m128_f32[0] = fovX_calc.m128_f32[0] * 0.5; // Arg: fovX_calc. Result: fovX_calc = tan(FovX / 2) 5 ucrtBase_Tanf(); 6 fovY_calc = fovX_calc; 7 tanFovXby2_dup = fovX_calc.m128_f32[0]; // Divide by Aspect Ratio at a1 + 0x18 (hardcoded at 1.777) // fovY_calc = tan(FovX / 2) / AspectRatio 8 fovY_calc.m128_f32[0] = fovX_calc.m128_f32[0] / *(float *)(a1 + 0x18); // --- The Redundant Call --- // Arg: fovX_calc. Result: atan(tan(FovX / 2)). Reverts back to FovX / 2 9 ucrtBase_aTanf(); // --- FovY Calculation --- // Check if FovY is pre-calculated (usually 0) 10 fovY_val = (__m128)*(unsigned int *)(a1 + 0x430); 11 fovX_val = fovX_calc; // fovX_val = (FovX / 2) * 2.0 -> FovX 12 fovX_val.m128_f32[0] = fovX_calc.m128_f32[0] * 2.0; 13 if ( fovY_val.m128_f32[0] == 0.0 ) 14 { // Arg: fovY_calc. Result: atan(tan(FovX / 2) / AspectRatio) 15 ucrtBase_aTanf(); 16 fovY_val = fovY_calc; // fovY_val = 2 * atan(tan(FovX / 2) / AspectRatio) 17 fovY_val.m128_f32[0] = fovY_calc.m128_f32[0] * 2.0; 18 } // fovX_val = FovX / 2 19 fovX_val.m128_f32[0] = fovX_val.m128_f32[0] * 0.5; // Arg: fovX_val. Result: fovX_val = tan(FovX / 2) 20 ucrtBase_Tanf(); 21 xScale = (__m128)0x3F800000u; // 1.0f // fovY_val = FovY / 2 22 fovY_val.m128_f32[0] = fovY_val.m128_f32[0] * 0.5; // Save tan(FovX / 2) for later calculations 23 tan_fovX_half_saved = fovX_val; // --- Final xScale Calculation --- // xScale = 1.0 / tan(FovX / 2) 24 xScale.m128_f32[0] = 1.0 / fovX_val.m128_f32[0]; // Arg: fovY_val. Result: fovY_val = tan(FovY / 2) 25 ucrtBase_Tanf(); 26 v39 = (__m128)*(unsigned int *)(a1 + 0x428); 27 v40 = (__m128 *)(a1 + 0xF0); 28 v41 = a1 + 0x130; // Save tan(FovY / 2) for later calculations 29 tan_fovY_half_saved = fovY_val; 30 v43 = (__m128)0x3F800000u; // 1.0f 31 yScale = (__m128)0x3F800000u; // 1.0f // --- Final yScale Calculation --- // yScale = 1.0 / tan(FovY / 2) 32 yScale.m128_f32[0] = 1.0 / fovY_val.m128_f32[0];

Since this part is getting too long the reversal for depth mapping calculations will be done on the next part!

In Part 5.2, we will look at how the engine uses zNear and zFar for depth mapping, and how it uses SIMD instructions like _mm_unpacklo_ps to pack all of these isolated variables into the final 4x4 projection matrix in memory.

Part 5: Reversing Construction of the Projection Matrix

2026-04-03T18:30:00+00:00

Now that we have found the function responsible for constructing the Perspective Projection Matrix let’s begin reversing it to have a clear understanding on how the game engine constructs this matrix every frame!

Now we wont be reversing the entire function which we have stumbled into because the function by “finding out what writes to this address” feature in CE, but in short the function we have stumbled into is a very large function responsible for constructing various Matrices such as Camera, View, Projection, Inverse Projection, Identity Scaler, ViewProjection, InvProjCamera and has multiple function calls to Matrix4x4Multiply(), Matrix4x4Inverse() and even a View Frustum Culling function. We will only be looking into Projection and Inverse Projection.

Let’s look at where we were initially:

It seems to be a part of an if-else branch where our instruction is being executed inside the else block.

else block:

if block:

Now we clearly see the big picture happening here, The else block is clearly constructing a Perspective Projection Matrix as per my reasoning in part-4. The if block seems to be constructing a Orthographic Projection Matrix, my reasoning being these lines:

v51.m128_f32[0] = 2.0 / *(float *)(a1 + 0x3C0); // 2/width

and

v55.m128_f32[0] = (float)(1.0 / *(float *)(a1 + 0x3C4)) + (float)(1.0 / *(float *)(a1 + 0x3C4)); // 2/height

The values inside a1 + 0x3C0 and a1 + 0x3C4 have been checked to be Width and Height using CE during runtime.

These are the expected xScale and yScale values for an Orthographic Projection matrix. And look where it is stored:

*(__m128 *)(a1 + 0xF0) = v54; // v54 = [2/width, 0, 0, 0]

and

*(__m128 *)(a1 + 0x100) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps(v55, (__m128)0LL)); // Final Unpack = [0, 2/height, 0, 0]

But what the hell is an _mm_unpacklo_ps??

The basic theory is simple:

the “lo” in “unpacklo” means we are only targeting the lowest 64-bits inside a 128-bit register and “ps” stands for “Packed Single” which tells the cpu to treat the 128-bit register as four 32-bit floats.
This is basically what it does:

Suppose:

Result:

[ A0, B0, A1, B1 ]

that’s all.

As a refresher, a standard Orthographic Projection Matrix looks like this:

\[\begin{bmatrix} \frac{2}{w} & 0 & 0 & 0 \\ 0 & \frac{2}{h} & 0 & 0 \\ 0 & 0 & \frac{1}{z_{far} - z_{near}} & 0 \\ 0 & 0 & -\frac{z_{near}}{z_{far} - z_{near}} & 1 \end{bmatrix}\]

Thus validating our belief that the if block is constructing an Orthographic Projection Matrix so the if-else logic would be:

if(shouldConstructOrthoProj) { ConstructOrthoProj(); } else { ConstructPerspectiveProj(); }

Now for the hard part, Reversing the complete logic inside the else block…

Reversing Construction of the Perspective-Projection Matrix

Let’s begin Reversing the else block:

xScale Construction:

Because the compiler interleaved the instructions for optimization, the calculation for the X and Y scales are tangled together. Let’s isolate just the xScale logic:

Lines highlighted in red are responsible for xScale calculation.

Let’s start with this block

3 v33 = (__m128)*(unsigned int *)(a1 + 0x234); 4 v33.m128_f32[0] = v33.m128_f32[0] * 0.5; 5 ucrtBase_Tanf();

Using CE for dynamic analysis we can see that “(a1 + 0x234)” is FovX in radians or “0.7673” matching our Fov slider we set to “105 degrees” before.

It loads it as an (__m128)*(unsigned int *) which treats it as an integer but we know it is a float, it doesn’t matter what type it is as long as it’s 4 bytes. At the CPU level, bits are just bits..
So basically load a FovX in radians into v33 then immediately divide it by 2 so now v33 holds the value fovX/2.

Next we see a call ucrtBase_Tanf(); which ida has failed to assign arguments to. But no worries we will look at the assembly for its arguments.

Following Windows ABI convention xmm0 will have the arg and result will be stored inside xmm0 as well. Tracing the assembly we see:

[rbx+234h] is our fovX which gets stored into xmm0, gets multiplied by 0.5 and used as an arg for tanf call. After the call the lowest 32-bits of xmm0 register will hold tan(fovX/2).

After this block there are lines which use FovX which i have not highlighted, this is becuase its either trying to derive FovY with FovX or is saving the current value inside v33 for later use.

Next block is:

9 ucrtBase_aTanf(); 11 v36 = v33; 12 v36.m128_f32[0] = v33.m128_f32[0] * 2.0;

Moving onto the next line we see redundancy or a quirk, it is an ucrtBase_aTanf(); call and looking at the assembly the argument is v33 again so now v33 = atanf(tanf(fovX/2)) which will equal “fovX/2”.

why?

FovX can only have a value from 60 to 120 degrees in game which is between -π/2 and π/2 so arctan(tan(x)) = x. calling a atanf function just to get back fovX/2 is a waste of CPU cycles but still negligible.

then:

v36 = v33 where v33 was fovX/2. Then multiply v33[0] by 2 and store it into v36 effectively being just FovX.

Next block is:

19 v36.m128_f32[0] = v36.m128_f32[0] * 0.5; 20 ucrtBase_Tanf(); 23 v38 = v36; 24 v37.m128_f32[0] = 1.0 / v36.m128_f32[0];

v36 was fovX, now its fovX/2 after “v36 * 0.5”.

Then it calls a ucrtBase_Tanf() with arg as v36 so the result in v36 is tan(fovX/2).

Next it saves tan(fovX/2) into v38 for later calculations and finally does “v37.m128_f32[0] = 1.0 / v36.m128_f32[0];” Completing our calculation for xScale and saving it inside v37.

\[x_{scale} = \frac{1}{\tan{\left(\frac{Fov_X}{2}\right)}}\]

yScale Construction:

Let’s start with this block:

6 v34 = v33; 8 v34.m128_f32[0] = v33.m128_f32[0] / *(float *)(a1 + 0x18); 10 v35 = (__m128)*(unsigned int *)(a1 + 0x430);

v34 is first assigned the value of tan(fovX/2) (as an m128 so only lowest 32 bits are fov values) then later is divided by a value at “*(float *)(a1 + 0x18)”;. With dynamic analysis we can see it is a constant of “1.777” which is our aspect ratio of 16:9 but the interesting part is that it is constant and wont change even when aspect ratio is 4:3 or 16:10.

So now v34 holds the value “tan(fovX/2) / Aspect Ratio”, Hmm this formula looks familiar…

\[\tan\left(\frac{FOV_X}{2}\right) \,/\, A = \tan\left(\frac{FOV_Y}{2}\right)\]

In the next line a value from “(__m128)*(unsigned int *)(a1 + 0x430)” is loaded into v35 but with dynamic analysis we can see most of the time this is zero.

Next block:

13 if ( v35.m128_f32[0] == 0.0 ) 14 { 15 ucrtBase_aTanf(); 16 v35 = v34; 17 v35.m128_f32[0] = v34.m128_f32[0] * 2.0; 18 }

if “(__m128)*(unsigned int *)(a1 + 0x430)” / “v35” is zero then it will calculate the value of FovY using FovX with this formula:

\[FOV_Y = 2 \cdot \tan^{-1}\!\left(\frac{\tan\!\left(\frac{FOV_X}{2}\right)}{A}\right)\]

and save it inside “v35”

v34 previously held the value of “tan(fovX/2) / Aspect Ratio” so now after the ucrtBase_aTanf(); call with its arg being “v34”, “v34” will have the value atan(tan(fovX/2) / AspectRatio)). This gets saved into v35 and and immeditly after multiples v34 with 2 and saves it inside v35. So now v35 has the value:

“2 * atan(tan(fovX/2) / AspectRatio))” matching our formula exactly!

Next block:

22 v35.m128_f32[0] = v35.m128_f32[0] * 0.5; 25 ucrtBase_Tanf(); 29 v42 = v35; 32 v44.m128_f32[0] = 1.0 / v35.m128_f32[0];

Next it will calculate FovY/2 and save it into “v35” then save the value of current “v35” into “v42” for later calculations.

Then it finally does a call to ucrtBase_aTanf(); with “v35” as arg so the value inside “v35” is tan(fovY/2), Next it will complete the calculation for yScale by doing:

v44.m128_f32[0] = 1.0 / v35.m128_f32[0]; so final value inside “v44” is “1/tan(fovY/2)”

So now v44 = 1/tan(fovY/2) and v37 = 1/tan(fovX/2) (Only the lowest 32-bits are used, rest are 0’s).

Putting It All Together: The Cleaned Code

Now that we understand the math behind both the X and Y scales, we can go back into IDA, rename our variables, and comment the IDA pseudo code.

else { // --- 1. Initial Setup & FovX --- // Load FovX (radians) from a1 + 0x234 fovX_calc = (__m128)*(unsigned int *)(a1 + 0x234); fovX_calc.m128_f32[0] = fovX_calc.m128_f32[0] * 0.5; // Arg: fovX_calc. Result: fovX_calc = tan(FovX / 2) ucrtBase_Tanf(); fovY_calc = fovX_calc; tanFovXby2_dup = fovX_calc.m128_f32[0]; // Divide by Aspect Ratio at a1 + 0x18 (hardcoded at 1.777) // fovY_calc = tan(FovX / 2) / AspectRatio fovY_calc.m128_f32[0] = fovX_calc.m128_f32[0] / *(float *)(a1 + 0x18); // --- 2. The Redundant Call --- // Arg: fovX_calc. Result: atan(tan(FovX / 2)). Reverts back to FovX / 2 ucrtBase_aTanf(); // --- 3. FovY Calculation --- // Check if FovY is pre-calculated (usually 0) fovY_val = (__m128)*(unsigned int *)(a1 + 0x430); fovX_val = fovX_calc; // fovX_val = (FovX / 2) * 2.0 -> FovX fovX_val.m128_f32[0] = fovX_calc.m128_f32[0] * 2.0; if ( fovY_val.m128_f32[0] == 0.0 ) { // Arg: fovY_calc. Result: atan(tan(FovX / 2) / AspectRatio) ucrtBase_aTanf(); fovY_val = fovY_calc; // fovY_val = 2 * atan(tan(FovX / 2) / AspectRatio) fovY_val.m128_f32[0] = fovY_calc.m128_f32[0] * 2.0; } // --- 4. Final xScale Calculation --- // fovX_val = FovX / 2 fovX_val.m128_f32[0] = fovX_val.m128_f32[0] * 0.5; // Arg: fovX_val. Result: fovX_val = tan(FovX / 2) ucrtBase_Tanf(); xScale = (__m128)0x3F800000u; // 1.0f // fovY_val = FovY / 2 fovY_val.m128_f32[0] = fovY_val.m128_f32[0] * 0.5; // Save tan(FovX / 2) for later calculations tan_fovX_half_saved = fovX_val; // Final xScale = 1.0 / tan(FovX / 2) xScale.m128_f32[0] = 1.0 / fovX_val.m128_f32[0]; // --- 5. Final yScale Calculation --- // Arg: fovY_val. Result: fovY_val = tan(FovY / 2) ucrtBase_Tanf(); v39 = (__m128)*(unsigned int *)(a1 + 0x428); v40 = (__m128 *)(a1 + 0xF0); v41 = a1 + 0x130; // Save tan(FovY / 2) for later calculations tan_fovY_half_saved = fovY_val; v43 = (__m128)0x3F800000u; // 1.0f yScale = (__m128)0x3F800000u; // 1.0f // Final yScale = 1.0 / tan(FovY / 2) yScale.m128_f32[0] = 1.0 / fovY_val.m128_f32[0]; }

Depth Calculation:

We’ve got the xScale and yScale calculations sorted now its time to see how they calculate Depth Mapping, the values at [2][2] and [3][2].

Part 5.2: Reversing Construction of the Projection Matrix (Depth Mapping)

2026-04-03T18:30:00+00:00

Depth Calculation:

We’ve got the xScale and yScale calculations sorted, now it’s time to see how they calculate Depth Mapping, the values at [2][2] and [3][2].

Row 2:

26v39 = (__m128)*(unsigned int *)(a1 + 0x428); 27v40 = (__m128 *)(a1 + 0xF0); 28v41 = a1 + 0x130; 29v42 = v35; 30v43 = (__m128)0x3F800000u; 31v44 = (__m128)0x3F800000u; 32v44.m128_f32[0] = 1.0 / v35.m128_f32[0]; 33v45 = v32; 34v45.m128_f32[0] = v32.m128_f32[0] - v31.m128_f32[0]; 35v46 = (__m128)*(unsigned int *)(a1 + 0x42C); 36*(__m128 *)(a1 + 0x100) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps(v44, (__m128)0LL)); 37v43.m128_f32[0] = 1.0 / (float)(v32.m128_f32[0] - v31.m128_f32[0]); 38*(__m128 *)(a1 + 0xF0) = _mm_unpacklo_ps(_mm_unpacklo_ps(v37, (__m128)0LL), (__m128)0LL); 39v47 = v43; 40v47.m128_f32[0] = v43.m128_f32[0] * v31.m128_f32[0]; 41*(__m128 *)(a1 + 0x110) = _mm_unpacklo_ps(_mm_unpacklo_ps(v39, v47), _mm_unpacklo_ps(v46, (__m128)0xBF800000));

We will isolate all things related to row 2 (yes, 2… we are counting rows from 0).

Let’s start with these lines:

26v39 = (__m128)*(unsigned int *)(a1 + 0x428); 33v45 = v32;

The first line seems to just load a zero from memory but the second line is interesting…
Since we’ve never seen what “v35” holds let’s trace back to where it gets it value from.

Notice the if (a2) block: it conditionally swaps zNear and zFar. This seems to be a case of Standard Depth projection VS Reverse-Z projection. Since my testing shows the if block does not execute, the engine is defaulting to a standard depth mapping. For the sake of this walkthrough, we will proceed with the unswapped variables: v31 as zFar and v32 as zNear

Now I will rename the variables “v31” and “v32” as zFar and zNear respectively.

So now:

33v45 = zNear; 34v45.m128_f32[0] = zNear.m128_f32[0] - zFar.m128_f32[0]; 35v46 = (__m128)*(unsigned int *)(a1 + 0x42C);// 0.1299999952 36*(__m128 *)(a1 + 0x100) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps(yScale, (__m128)0LL)); 37v43.m128_f32[0] = 1.0 / (float)(zNear.m128_f32[0] - zFar.m128_f32[0]); 38*(__m128 *)(a1 + 0xF0) = _mm_unpacklo_ps(_mm_unpacklo_ps(xScale, (__m128)0LL), (__m128)0LL); 39v47 = v43; 40v47.m128_f32[0] = v43.m128_f32[0] * zFar.m128_f32[0]; 41*(__m128 *)(a1 + 0x110) = _mm_unpacklo_ps(_mm_unpacklo_ps(v39, v47), _mm_unpacklo_ps(v46, (__m128)0xBF800000));

The Anomaly / Asymmetric offset

Let’s just get the anomaly out of the room now itself:

35v46 = (__m128)*(unsigned int *)(a1 + 0x42C);// 0.1299999952

Through dynamic analysis we see it’s loading a constant value of “0.1299999952” and further down the line it gets placed at “[2][1]” slot of the projection matrix. This seems to be responsible for some kind of Asymmetric / Off-Center Frustum construction.

How about we just test this theory right now? Let’s try to remove the “0.13” constant from the Projection matrix and see how it would affect the render in-game.

Let’s do so in cheat engine by doing a trampoline hook.

original instruction:

movss xmm3,[rbx+0000042C]

our jmp:

jmp codeCave
nop

Code Cave:

xorps xmm3, xmm3
jmp back

Here is a comparison in game with the anomaly factor enabled versus nullified

This pretty much confirms that this is an intentional asymmetric offset.

Let’s now continue with depth mapping calculations.

Starting with this block:

37v43.m128_f32[0] = 1.0 / (float)(zNear.m128_f32[0] - zFar.m128_f32[0]); 39v47 = v43; 40v47.m128_f32[0] = v43.m128_f32[0] * zFar.m128_f32[0];

This is really self-explanatory…

v43 = 1 / zNear - zFar

Then:

v47 = v43 * zFar

Putting it all together we get:

\[\large v_{47} = \frac{z_{far}}{z_{near} - z_{far}}\]

I will now rename “v47” to “zFarByzNearNegzFar”.

Next is Row3!

Row3:

37v43.m128_f32[0] = 1.0 / (float)(zNear.m128_f32[0] - zFar.m128_f32[0]); 38*(__m128 *)(a1 + 0xF0) = _mm_unpacklo_ps(_mm_unpacklo_ps(xScale, (__m128)0LL), (__m128)0LL); 39v47 = v43; 40v47.m128_f32[0] = v43.m128_f32[0] * zFar.m128_f32[0]; 41*(__m128 *)(a1 + 0x110) = _mm_unpacklo_ps(_mm_unpacklo_ps(v39, v47), _mm_unpacklo_ps(v46, (__m128)0xBF800000)); 42zFar.m128_f32[0] = zFar.m128_f32[0] * zNear.m128_f32[0]; 43v48 = zFar; 44v48.m128_f32[0] = zFar.m128_f32[0] * v43.m128_f32[0]; 45v45.m128_f32[0] = v45.m128_f32[0] / zFar.m128_f32[0]; 46v49 = _mm_unpacklo_ps(_mm_unpacklo_ps((__m128)0LL, v48), (__m128)0LL); 47v50 = (__m128)0x3F800000u; 48*(__m128 *)(a1 + 0x120) = v49;

Taking this block:

37v43.m128_f32[0] = 1.0 / (float)(zNear.m128_f32[0] - zFar.m128_f32[0]); 42zFar.m128_f32[0] = zFar.m128_f32[0] * zNear.m128_f32[0]; 43v48 = zFar; 44v48.m128_f32[0] = zFar.m128_f32[0] * v43.m128_f32[0];

The first line for v43 is 1 / zNear - zFar

But in the next line the value inside zFar Changes to zFar * zNear

Then it’s simply: v48 = zFar(zFar*zNear) * v43

putting it all together its:

\[\large v_{48} = \frac{z_{far} \cdot z_{near}}{z_{near} - z_{far}}\]

Putting it all into the Projection Matrix:

Let’s isolate all lines which store the Projection Matrix into the Camera Structure:

Row 0:

Let’s see what’s happening here line by line:

38*(__m128 *)(a1 + 0xF0) = _mm_unpacklo_ps(_mm_unpacklo_ps(xScale, (__m128)0LL), (__m128)0LL);

This is what mm_unpacklo_ps does: But what the hell is an _mm_unpacklo_ps??

Let’s go through what mm_unpacklo_ps does in this line:

going through the inner unpack first

_mm_unpacklo_ps(xScale, (__m128)0LL)

the xScale variable has 4 floats and only the lowest float is the “xScale” rest is zero, so:

xScale: x = 1/tan(fovX/2), y = 0, z = 0, w = 0

then (__m128)0LL just means a m128 variable with all floats set to 0.

unpacking this we get:

Result = [1/tan(fovX/2), 0, 0, 0]

After this it does one more unpacklo

_mm_unpacklo_ps(Result, (__m128)0LL)

Doing this we again get:

Result = [1/tan(fovX/2), 0, 0, 0]

so you really only needed 1 mm_unpacklo_ps…

Then it simply stores it as “ROW 0” of the Projection matrix inside the camera structure “(a1+0xF0)”.

Since mm_unpacklo_ps is a pretty straightforward instruction I will not be babysitting by showing each unpack every single row, here are the lines and the result:

Row 1:

36*(__m128 *)(a1 + 0x100) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps(yScale, (__m128)0LL));

Result: [0, tan(fovY/2), 0, 0]

Row 2

41*(__m128 *)(a1 + 0x110) = _mm_unpacklo_ps(_mm_unpacklo_ps(v39, zFarByzNearNegzFar), _mm_unpacklo_ps(v46, (__m128)0xBF800000));

Alright let’s slow down here a bit:

here “v39” comes from:

26v39 = (__m128)*(unsigned int *)(a1 + 0x428);

which seems to just be zero.

v46 is the “0.1299999952” constant we’ve seen before and “0xBF800000” is hex for “-1.f”

unpacking all this we get:

Result: [0, 0.13, Far / (Near-Far), -1]

Row 3:

46v49 = _mm_unpacklo_ps(_mm_unpacklo_ps((__m128)0LL, v48), (__m128)0LL); 48*(__m128 *)(a1 + 0x120) = v49;

v48 is far*near/Near-far

Result: [0, 0, (far*near) / (Near-far), 0]

Putting the code together:

// Load 0.0f 26 v39 = (__m128)*(unsigned int *)(a1 + 0x428); 27 v40 = (__m128 *)(a1 + 0xF0); // Pointer to Row 0 28 v41 = a1 + 0x130; 29 tan_fovY_half_save = fovY_val; 30 v43 = (__m128)0x3F800000u; // 1.0f 31 yScale = (__m128)0x3F800000u; // 1.0f // --- Final yScale Calculation --- // yScale = 1.0 / tan(FovY / 2) 32 yScale.m128_f32[0] = 1.0 / fovY_val.m128_f32[0]; // Calculate zNear - zFar 33 v45 = zNear; 34 v45.m128_f32[0] = zNear.m128_f32[0] - zFar.m128_f32[0]; // Load the asymmetric frustum offset (0.13) 35 anomaly_offset = (__m128)*(unsigned int *)(a1 + 0x42C); // --- Constructing Row 1 --- // Pack and store Row 1: [0, yScale, 0, 0] 36 *(__m128 *)(a1 + 0x100) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps(yScale, (__m128)0LL)); // v43 = 1.0 / (zNear - zFar) 37 v43.m128_f32[0] = 1.0 / (float)(zNear.m128_f32[0] - zFar.m128_f32[0]); // --- Constructing Row 0 --- // Pack and store Row 0: [xScale, 0, 0, 0] 38 *(__m128 *)(a1 + 0xF0) = _mm_unpacklo_ps(_mm_unpacklo_ps(xScale, (__m128)0LL), (__m128)0LL); // --- Constructing Row 2 --- 39 zFarByzNearNegzFar = v43; // zFarByzNearNegzFar = (1.0 / (zNear - zFar) * zFar) 40 zFarByzNearNegzFar.m128_f32[0] = v43.m128_f32[0] * zFar.m128_f32[0]; // Pack and store Row 2: [0, 0.13, zFar / (zNear - zFar), -1.0] // Note: 0xBF800000 is -1.0f, v39 is always 0 41 *(__m128 *)(a1 + 0x110) = _mm_unpacklo_ps(_mm_unpacklo_ps(v39, zFarByzNearNegzFar), _mm_unpacklo_ps(anomaly_offset, (__m128)0xBF800000)); // --- Constructing Row 3 --- // Modifying zFar in place: zFar = zFar * zNear 42 zFar.m128_f32[0] = zFar.m128_f32[0] * zNear.m128_f32[0]; 43 v48 = zFar; // v48 = (zFar * zNear) * (1.0 / (zNear - zFar)) 44 v48.m128_f32[0] = zFar.m128_f32[0] * v43.m128_f32[0]; // Used for inverse Projection 45 v45.m128_f32[0] = v45.m128_f32[0] / zFar.m128_f32[0]; // Pack Row 3: [0, 0, (zFar * zNear) / (zNear - zFar), 0] 46 row3_packed = _mm_unpacklo_ps(_mm_unpacklo_ps((__m128)0LL, v48), (__m128)0LL); 47 v50 = (__m128)0x3F800000u; // 1.0f // Store Row 3 48 *(__m128 *)(a1 + 0x120) = row3_packed;

Final Layout:

Now we have finally reverse engineered how the game constructs the Perspective-Projection Matrix per frame with it’s layout being:

\[\begin{bmatrix} \frac{1}{\tan(fovX/2)} & 0 & 0 & 0 \\ 0 & \frac{1}{\tan(fovY/2)} & 0 & 0 \\ 0 & 0.13 & \frac{Far}{Near - Far} & -1 \\ 0 & 0 & \frac{far \cdot near}{Near - far} & 0 \end{bmatrix}\]

Stored at: a1 + 0xF0 to a1 + 0x130 with a1 being the Camera Structure

The Payoff: Owning the Camera

Because we don’t just know what the projection matrix is, we know exactly how it gets built, instruction by instruction. And in reverse engineering, understanding construction means you have the power of interception. We are no longer limited to what the game developers expose in the settings menu; we can force the engine to render however we want.

By hooking this function and manipulating the registers before the final _mm_unpacklo_ps calls, we open the door to massive engine modifications!

I already talked about what we could do with accesses to the projection matrix construction: What’s the point of doing this?

Next up, we tackle the Inverse Projection Matrix.

Part 6: Reversing Construction of the Inverse Projection Matrix

2026-04-03T18:30:00+00:00

Now for the final matrix getting constructed by the else block!

I have isolated all lines relating to the construction of the Inverse Projection Matrix:

Let’s start with row 0.

Row 0:

*(__m128 *)(a1 + 0x130) = _mm_unpacklo_ps(_mm_unpacklo_ps(fovX_val, (__m128)0LL), (__m128)0LL);

Where the last value for fovX_val was:

// fovX_val = FovX / 2 19 fovX_val.m128_f32[0] = fovX_val.m128_f32[0] * 0.5; // Arg: fovX_val. Result: fovX_val = tan(FovX / 2) 20 ucrtBase_Tanf();

so Row 0 = [tan(fovX/2), 0, 0, 0]

Row 1:

*(__m128 *)(a1 + 0x140) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps(fovY_val, (__m128)0LL));

Where the last value for fovY_val was:

// fovY_val = FovY / 2 22 fovY_val.m128_f32[0] = fovY_val.m128_f32[0] * 0.5; // Arg: fovY_val. Result: fovY_val = tan(FovY / 2) 25 ucrtBase_Tanf();

so Row 1 = [0, tan(fovY/2), 0, 0]

Row 2:

*(__m128 *)(a1 + 0x150) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps((__m128)0LL, v45));

“v45” comes from:

// Calculate zNear - zFar 33 v45 = zNear; 34 v45.m128_f32[0] = zNear.m128_f32[0] - zFar.m128_f32[0]; // Modifying zFar in place: zFar = zFar * zNear 42 zFar.m128_f32[0] = zFar.m128_f32[0] * zNear.m128_f32[0]; // Used for inverse Projection 45 v45.m128_f32[0] = v45.m128_f32[0] / zFar.m128_f32[0];

I don’t feel like this needs explaining, all very self explanatory:

\[\large v_{45} = \frac{z_{near} - z_{far}}{z_{far} \cdot z_{near}}\]

Final Unpack: [0, 0, 0, (near - far) / (far * near)]

Row 3:

*(__m128 *)(a1 + 0x160) = _mm_unpacklo_ps( _mm_unpacklo_ps(tan_fovX_half_save, (__m128)0xBF800000), _mm_unpacklo_ps(tan_fovY_half_save, v50));

We already know “tan_fovX_half_save” is tan(fovX/2) which just before get’s multiplied with “0” while “tan_fovY_half_save” which previously had the value of “tan(fovY/2)” is multiplied with our anomaly “0.1299999952”.

These calculations are done in these lines:

// Apply asymmetric offset (0.13) to tan(FovY/2) 49 tan_fovY_half_save.m128_f32[0] = fovY_val.m128_f32[0] * *(float *)(a1 + 0x42C); // 0.1299999952 // Apply X-offset (0.0) to tan(FovX/2) 50 tan_fovX_half_save.m128_f32[0] = fovX_val.m128_f32[0] * *(float *)(a1 + 0x428); // 0.0

Next, “v50” is simply:

// Calculate 1.0 / zNear 51 v50.m128_f32[0] = 1.0 / zNear.m128_f32[0];

and “0xBF800000” is “-1”

Final Unpack: [0, tan(fovY/2)*0.13, -1, 1/near]

Final Layout:

\[\begin{bmatrix} \tan(fovX/2) & 0 & 0 & 0 \\ 0 & \tan(fovY/2) & 0 & 0 \\ 0 & 0 & 0 & \frac{near - far}{far \cdot near} \\ 0 & \tan(fovY/2) \cdot 0.13 & -1 & \frac{1}{near} \end{bmatrix}\]

This matches what we have seen previously in cheat engine’s memory viewer!

Reversing Insight

The engine is doing something incredibly smart here. Instead of relying on a generic 4x4 matrix inversion algorithm (like Cramer’s rule) which requires a heavy, separate function call and eats up valuable CPU cycles. It performs a heavily optimized algebraic “fast inverse” inline. Because a projection matrix has so many known zeroes, the developers hardcoded the exact algebraic inverse right into the function.

Bit of a side track - Initially, when staring at this block, I didn’t even realize I was looking at an Inverse Projection Matrix. In hindsight it’s very obviously an inverse projection. We unconsciously look for standard math library calls (like a MatrixInversefunction), so when it’s just raw, inline floating-point math, it’s easy to miss.

So how do you prove a hunch when the code is ambiguous?

The Scientific Method of Hypothesis ➔ Observation ➔ Conclusion!

Hypothesis: This weird block of inline math is manually constructing the Inverse Projection Matrix.

Observation: I dumped the originally constructed Projection Matrix directly from memory and ran it through a standard matrix inversion script on my own. I then compared my calculated output against the values the engine was generating in this second matrix.

Conclusion: The floats lined up exactly. Hypothesis confirmed! we have found the Inverse Projection.

The Full Picture:

1 else 2 { // --- 1. Initial Setup & FovX --- // Load FovX (radians) from a1 + 0x234 3 fovX_calc = (__m128)*(unsigned int *)(a1 + 0x234); 4 fovX_calc.m128_f32[0] = fovX_calc.m128_f32[0] * 0.5; // Arg: fovX_calc. Result: fovX_calc = tan(FovX / 2) 5 ucrtBase_Tanf(); 6 fovY_calc = fovX_calc; 7 tanFovXby2_dup = fovX_calc.m128_f32[0]; // Divide by Aspect Ratio at a1 + 0x18 (hardcoded at 1.777) // fovY_calc = tan(FovX / 2) / AspectRatio 8 fovY_calc.m128_f32[0] = fovX_calc.m128_f32[0] / *(float *)(a1 + 0x18); // --- 2. The Redundant Call --- // Arg: fovX_calc. Result: atan(tan(FovX / 2)). Reverts back to FovX / 2 9 ucrtBase_aTanf(); // --- 3. FovY Calculation --- // Check if FovY is pre-calculated (usually 0) 10 fovY_val = (__m128)*(unsigned int *)(a1 + 0x430); 11 fovX_val = fovX_calc; // fovX_val = (FovX / 2) * 2.0 -> FovX 12 fovX_val.m128_f32[0] = fovX_calc.m128_f32[0] * 2.0; 13 if ( fovY_val.m128_f32[0] == 0.0 ) 14 { // Arg: fovY_calc. Result: atan(tan(FovX / 2) / AspectRatio) 15 ucrtBase_aTanf(); 16 fovY_val = fovY_calc; // fovY_val = 2 * atan(tan(FovX / 2) / AspectRatio) 17 fovY_val.m128_f32[0] = fovY_calc.m128_f32[0] * 2.0; 18 } // fovX_val = FovX / 2 19 fovX_val.m128_f32[0] = fovX_val.m128_f32[0] * 0.5; // Arg: fovX_val. Result: fovX_val = tan(FovX / 2) 20 ucrtBase_Tanf(); 21 xScale = (__m128)0x3F800000u; // 1.0f // fovY_val = FovY / 2 22 fovY_val.m128_f32[0] = fovY_val.m128_f32[0] * 0.5; // Save tan(FovX / 2) for later calculations 23 tan_fovX_half_saved = fovX_val; // --- 4. Final xScale Calculation --- // xScale = 1.0 / tan(FovX / 2) 24 xScale.m128_f32[0] = 1.0 / fovX_val.m128_f32[0]; // Arg: fovY_val. Result: fovY_val = tan(FovY / 2) 25 ucrtBase_Tanf(); 26 v39 = (__m128)*(unsigned int *)(a1 + 0x428); // Load 0.0f 27 v40 = (__m128 *)(a1 + 0xF0); // Pointer to Row 0 28 v41 = a1 + 0x130; // Save tan(FovY / 2) for later calculations 29 tan_fovY_half_save = fovY_val; 30 v43 = (__m128)0x3F800000u; // 1.0f 31 yScale = (__m128)0x3F800000u; // 1.0f // --- 5. Final yScale Calculation --- // yScale = 1.0 / tan(FovY / 2) 32 yScale.m128_f32[0] = 1.0 / fovY_val.m128_f32[0]; // --- 6. Depth Mapping Construction (Projection Matrix) --- // Calculate zNear - zFar 33 v45 = zNear; 34 v45.m128_f32[0] = zNear.m128_f32[0] - zFar.m128_f32[0]; // Load the asymmetric frustum offset (0.13) 35 anomaly_offset = (__m128)*(unsigned int *)(a1 + 0x42C); // Pack and store Row 1: [0, yScale, 0, 0] 36 *(__m128 *)(a1 + 0x100) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps(yScale, (__m128)0LL)); // v43 = 1.0 / (zNear - zFar) 37 v43.m128_f32[0] = 1.0 / (float)(zNear.m128_f32[0] - zFar.m128_f32[0]); // Pack and store Row 0: [xScale, 0, 0, 0] 38 *(__m128 *)(a1 + 0xF0) = _mm_unpacklo_ps(_mm_unpacklo_ps(xScale, (__m128)0LL), (__m128)0LL); 39 zFarByzNearNegzFar = v43; // zFarByzNearNegzFar = (1.0 / (zNear - zFar) * zFar) 40 zFarByzNearNegzFar.m128_f32[0] = v43.m128_f32[0] * zFar.m128_f32[0]; // Pack and store Row 2: [0, 0.13, zFar / (zNear - zFar), -1.0] // Note: 0xBF800000 is -1.0f, v39 is always 0 41 *(__m128 *)(a1 + 0x110) = _mm_unpacklo_ps(_mm_unpacklo_ps(v39, zFarByzNearNegzFar), _mm_unpacklo_ps(anomaly_offset, (__m128)0xBF800000)); // Modifying zFar in place: zFar = zFar * zNear 42 zFar.m128_f32[0] = zFar.m128_f32[0] * zNear.m128_f32[0]; 43 v48 = zFar; // v48 = (zFar * zNear) * (1.0 / (zNear - zFar)) 44 v48.m128_f32[0] = zFar.m128_f32[0] * v43.m128_f32[0]; // Used later for inverse Projection calculation 45 v45.m128_f32[0] = v45.m128_f32[0] / zFar.m128_f32[0]; // Pack Row 3: [0, 0, (zFar * zNear) / (zNear - zFar), 0] 46 row3_packed = _mm_unpacklo_ps(_mm_unpacklo_ps((__m128)0LL, v48), (__m128)0LL); 47 v50 = (__m128)0x3F800000u; // 1.0f // Store Row 3 48 *(__m128 *)(a1 + 0x120) = row3_packed; // --- 7. Inverse Projection Matrix Construction --- // Apply asymmetric offset (0.13) to tan(FovY/2) 49 tan_fovY_half_save.m128_f32[0] = fovY_val.m128_f32[0] * *(float *)(a1 + 0x42C); // 0.1299999952 // Apply X-offset (0.0) to tan(FovX/2) 50 tan_fovX_half_save.m128_f32[0] = fovX_val.m128_f32[0] * *(float *)(a1 + 0x428); // 0.0 // Calculate 1.0 / zNear 51 v50.m128_f32[0] = 1.0 / zNear.m128_f32[0]; // Store Inverse Matrix Row 0: [tan(FovX/2), 0, 0, 0] 52 *(__m128 *)(a1 + 0x130) = _mm_unpacklo_ps(_mm_unpacklo_ps(fovX_val, (__m128)0LL), (__m128)0LL); // Store Inverse Matrix Row 1: [0, tan(FovY/2), 0, 0] 53 *(__m128 *)(a1 + 0x140) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps(fovY_val, (__m128)0LL)); // Store Inverse Matrix Row 2: [0, 0, 0, near-far/far*near] 54 *(__m128 *)(a1 + 0x150) = _mm_unpacklo_ps((__m128)0LL, _mm_unpacklo_ps((__m128)0LL, v45)); // Store Inverse Matrix Row 3: [0, tan(fovY/2)*0.13, -1, 1/near] 55 *(__m128 *)(a1 + 0x160) = _mm_unpacklo_ps( _mm_unpacklo_ps(tan_fovX_half_save, (__m128)0xBF800000), _mm_unpacklo_ps(tan_fovY_half_save, v50)); 56 *(float *)(a1 + 0x1C) = 1.0 / fminf(tanFovXby2_dup, v34.m128_f32[0]); 57 }

Conclusion: Owning the Pipeline

And there we have it. We have successfully reverse-engineered the complete perspective and inverse projection matrix construction inside the Dunia Engine.

We are now free to Intercept, Modify and Read the matrices, We now control what the game can see.

A Trampoline hook here and we can control how the game engine goes from View -> Clip space.

Hopefully, this series has demystified the process and given you the tools to tackle your own reverse engineering targets. The math might look intimidating at first, but at the CPU level, it all breaks down to logic, patterns, and proving your hypotheses.

Happy reversing!