<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/feed.xml" rel="self" type="application/atom+xml" /><link href="https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/" rel="alternate" type="text/html" /><updated>2026-07-02T00:40:41+00:00</updated><id>https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/feed.xml</id><title type="html">Redundancy and Bloat seen in AAA Game Engines</title><subtitle>Overengineering in AAA Game Engines</subtitle><author><name>z1rp</name></author><entry><title type="html">Part 1: Dunia Engine (Redundancy And Over-Engineering)</title><link href="https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/part-1-dunia/" rel="alternate" type="text/html" title="Part 1: Dunia Engine (Redundancy And Over-Engineering)" /><published>2026-05-25T18:30:00+00:00</published><updated>2026-05-25T18:30:00+00:00</updated><id>https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/part-1-dunia</id><content type="html" xml:base="https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/part-1-dunia/"><![CDATA[<style>
.post-nav {
  display: flex;
  margin-top: 40px;
  padding-top: 20px;
  border-top: 1px solid #444;
  font-family: Consolas, "Liberation Mono", Menlo, monospace;
  font-size: 15px;
}
.post-nav a {
  color: #569cd6;
  text-decoration: none;
  padding: 10px 16px;
  background: #1e1e1e;
  border-radius: 6px;
  transition: background 0.2s ease;
}
.post-nav a:hover {
  background: #2d2d2d;
}
/* This specific class pushes the button to the right */
.next-only {
  margin-left: auto;
}
</style>

<style>
.ida-code{
  background:#1e1e1e;
  color:#dcdcdc;
  padding:12px;
  border-radius:8px;
  font-family: Consolas, "Liberation Mono", Menlo, monospace;
  font-size:14px;
  line-height:1.4;
  overflow-x:auto;
  white-space: pre; /* preserve spaces and linebreaks */
}

/* token classes you can use inside the div */
.ida-code .kw    { color:#569cd6; } /* keywords */
.ida-code .type  { color:#4ec9b0; } /* types */
.ida-code .fn    { color:#dcdcaa; } /* functions / intrinsics */
.ida-code .num   { color:#b5cea8; } /* numbers / hex */
.ida-code .var   { color:#9cdcfe; } /* variables */
.ida-code .const { color:#ce9178; } /* globals / constants */
.ida-code .comment{ color:#6a9955; font-style:italic; }
</style>

<h3 id="intro">Intro:</h3>

<p>Before I delve in this I need to get a few things out of the way ⚠️:</p>

<p>To be completely clear: I did not go looking for this. I was simply reversing engines to study how they constructed their fundamental Transformation matrices and handled temporal jitter logic. But reading a core rendering function and seeing such avoidable overhead practically hits you in the face. You are not looking at unoptimized code 
but codebase culture.</p>

<p>Nowww, the functions we are going to dissect are only run a few times per frame, so it really doesn’t matter if we optimize it or not. But it raises an uncomfortable question: If such optimizations are not even considered on these important rendering pathways then what other functions are overlooked? does such culture infest every other system in the engine?</p>

<p>This is a symptom of “Profiler-Invisible-Waste”; it is a classic case of missing the forest for the trees. You will never find this issue through a profiler, the culture is ingrained in every function. This leads to “Death By a Thousand Cuts”.</p>

<p><img src="/Redundancy-seen-in-AAA-game-engines/assets/images/part-1/vtune.png" alt="ESP-Image1" /></p>

<p>Look at this VTune capture. The top hotspots barely break 5% of the CPU time each. But look at the red box: 80.9% of the execution time is buried in <code class="language-plaintext highlighter-rouge">[Others]</code>. Profilers are designed to find massive, isolated bottlenecks.</p>

<p>But if your entire codebase is built on over-engineered abstractions and bloated generic math wrappers, the baseline execution cost of every function is raised. You don’t get a few obvious performance spikes; you get a uniformly elevated floor.</p>

<p>That 80% block is exactly where those “thousand cuts” are hiding.</p>

<p>We will go over 3 separate game engines:</p>

<ol>
  <li>Dunia Engine (Far Cry Series)</li>
  <li>Sucker Punch Studios Proprietary Engine (Ghost of Tsushima)</li>
  <li>Avalanche Engine (Just Cause Series)</li>
</ol>

<p>And all 3 have one common “Antagonist”.</p>

<h3 id="basic-outline">Basic Outline:</h3>

<p><strong>Redundancy and Over-Engineering:</strong></p>

<p>First we will go over all of the Redundant and Over-Engineered code which exists in the engine. Then explain why it is Redundant/Over-Engineered and how it could have been written. I will show Difference in instruction count side-by-side (Original vs Hand Written Assembly).</p>

<p>We will also come up with theories on why this happens, Specifically on the concept “Clean C++ Code ≠ Clean Compiled Code”</p>

<p><strong>Compatibility, Legacy and Readable Code</strong></p>

<p><em>The Compatibility Tax: Not Wrong, Just Capped.</em> In my next write-up we will look at the realities of AAA game development. Not all of this is the result of developers blindly trusting 
compilers. I will show sections of the engine where the code could theoretically run way faster (with some functions reaching 5× speedups) due to new advancements in processor architecture, but couldn’t be fully utilized because the engine must maintain compatibility with a wide range of hardware.</p>

<p>This whole write-up is just food for thought, a question really: “What is the performace tax for such abstractions?”</p>

<h3 id="case-1-y-up-to-z-up-over-engineering-dunia-engine">Case 1: Y-up To Z-up Over-Engineering (Dunia Engine)</h3>

<p><img src="/Redundancy-seen-in-AAA-game-engines/assets/images/part-1/YupZupIDA.png" alt="ESP-Image1" /></p>

<p>This is doing a Coordinate Space Conversion in a very “Textbooky” way. They first construct a Matrix:</p>

\[M =
\begin{bmatrix}
1 &amp; 0 &amp; 0 &amp; 0 \\
0 &amp; -4.37 \times 10^{-8} &amp; 1 &amp; 0 \\
0 &amp; -1 &amp; -4.37 \times 10^{-8} &amp; 0 \\
0 &amp; 0 &amp; 0 &amp; 1
\end{bmatrix}\]

<p>on the stack, probably using some function like <code class="language-plaintext highlighter-rouge">Matrix::CreateRotationX(-PI / 2)</code></p>

<p>So it actually constructed:</p>

\[\begin{bmatrix}
1 &amp; 0 &amp; 0 &amp; 0 \\
0 &amp; \cos(-90^\circ) &amp; -\sin(-90^\circ) &amp; 0 \\
0 &amp; \sin(-90^\circ) &amp; \cos(-90^\circ) &amp; 0 \\
0 &amp; 0 &amp; 0 &amp; 1
\end{bmatrix}\]

<p>Then pass the arguments to <code class="language-plaintext highlighter-rouge">MatrixMultiply4x4(&amp;CameraMatrix,  (__int64)&amp;matrixPointer);</code></p>

<p>Where the logic for the function is <code class="language-plaintext highlighter-rouge">a2 × a1 = a1</code></p>

<p>Let’s put a breakpoint in the multiply call to see exactly what happens after this  multiplication:</p>

<blockquote>
  <p>I am looking exactly “north” in-game when breakpointing</p>
</blockquote>

<p>Before:</p>

<p><img src="/Redundancy-seen-in-AAA-game-engines/assets/images/part-1/beforeYup.png" alt="ESP-Image1" /></p>

<p>After:</p>

<p><img src="/Redundancy-seen-in-AAA-game-engines/assets/images/part-1/afterYup.png" alt="ESP-Image1" /></p>

<p>So, take the up vector, negate it and put it on row 2, then take the forward vector and put it on row 1.</p>

<p>Let me put into perspective how many assembly instructions were executed just to do this:</p>

<blockquote>
  <p>Pseudo code of MatrixMultiply4x4:
<img src="/Redundancy-seen-in-AAA-game-engines/assets/images/part-1/multiplyIDA.png" alt="ESP-Image1" /></p>
</blockquote>

<style>
.asm64-wrap { display: flex; flex-direction: column; gap: 18px; padding: 4px 0; }
.asm64-section-label {
  font-size: 13px;
  color: var(--color-text-secondary);
  margin-bottom: 4px;
  font-family: var(--font-sans);
}
.asm64-code {
  background: #1e1e1e;
  color: #c5c8c6;
  padding: 12px;
  border-radius: 8px;
  font-family: "Cascadia Code", Consolas, "Liberation Mono", Menlo, monospace;
  font-size: 13px;
  line-height: 1.45;
  overflow-x: auto;
  white-space: pre;
  margin: 0;
}
.asm64-code .kw     { color: #d78700; font-weight: 600; }
.asm64-code .reg    { color: #5fafaf; }
.asm64-code .mem    { color: #af87d7; }
.asm64-code .num    { color: #b5cea8; }
.asm64-code .label  { color: #ffaf5f; }
.asm64-code .comment{ color: #6a9955; font-style: italic; }
.asm64-code .const  { color: #ce9178; }
</style>

<div class="asm64-wrap">

First load Camera Matrix:

<div>
<div class="asm64-code"><span class="kw">movaps</span>  <span class="reg">xmm0</span>, <span class="kw">xmmword ptr</span> <span class="mem">[rdi]</span>
<span class="kw">movaps</span>  <span class="reg">xmm1</span>, <span class="kw">xmmword ptr</span> <span class="mem">[rbx+<span class="num">0C0h</span>]</span>
<span class="kw">movaps</span>  <span class="mem">[rsp+<span class="num">1A0h</span>+<span class="const">CameraMatrix</span>]</span>, <span class="reg">xmm0</span>
<span class="kw">movaps</span>  <span class="reg">xmm0</span>, <span class="kw">xmmword ptr</span> <span class="mem">[rbx+<span class="num">0D0h</span>]</span>
<span class="kw">movaps</span>  <span class="mem">[rsp+<span class="num">1A0h</span>+<span class="const">CamMatRow1</span>]</span>, <span class="reg">xmm1</span>
<span class="kw">movaps</span>  <span class="reg">xmm1</span>, <span class="kw">xmmword ptr</span> <span class="mem">[rbx+<span class="num">0E0h</span>]</span>
<span class="kw">movaps</span>  <span class="mem">[rsp+<span class="num">1A0h</span>+<span class="const">CamMatRow2</span>]</span>, <span class="reg">xmm0</span>
<span class="kw">movaps</span>  <span class="mem">[rsp+<span class="num">1A0h</span>+<span class="const">CamMatRow3</span>]</span>, <span class="reg">xmm1</span></div>
</div>

Construct the Swizzle Matrix:

<div>
<div class="asm64-code"><span class="kw">mov</span>     <span class="kw">qword ptr</span> <span class="mem">[rsp+<span class="num">1A0h</span>+<span class="const">matrixPointer</span>]</span>, <span class="num">3F800000h</span>
<span class="kw">mov</span>     <span class="kw">dword ptr</span> <span class="mem">[rbp+<span class="num">0A0h</span>+<span class="const">var_120</span>]</span>, <span class="num">0</span>
<span class="kw">mov</span>     <span class="kw">dword ptr</span> <span class="mem">[rbp+<span class="num">0A0h</span>+<span class="const">var_110</span>]</span>, <span class="num">0</span>
<span class="kw">mov</span>     <span class="kw">qword ptr</span> <span class="mem">[rbp+<span class="num">0A0h</span>+<span class="const">var_100</span>]</span>, <span class="num">0</span>
<span class="kw">mov</span>     <span class="kw">dword ptr</span> <span class="mem">[rbp+<span class="num">0A0h</span>+<span class="const">var_120</span>+<span class="num">4</span>]</span>, <span class="num">0B33BBD2Eh</span>
<span class="kw">mov</span>     <span class="kw">dword ptr</span> <span class="mem">[rbp+<span class="num">0A0h</span>+<span class="const">var_110</span>+<span class="num">4</span>]</span>, <span class="num">0BF800000h</span>
<span class="kw">mov</span>     <span class="kw">qword ptr</span> <span class="mem">[rsp+<span class="num">1A0h</span>+<span class="const">matrixPointer</span>+<span class="num">8</span>]</span>, <span class="num">0</span>
<span class="kw">mov</span>     <span class="kw">qword ptr</span> <span class="mem">[rbp+<span class="num">0A0h</span>+<span class="const">var_120</span>+<span class="num">8</span>]</span>, <span class="num">3F800000h</span>
<span class="kw">mov</span>     <span class="kw">dword ptr</span> <span class="mem">[rbp+<span class="num">0A0h</span>+<span class="const">var_110</span>+<span class="num">8</span>]</span>, <span class="num">0B33BBD2Eh</span>
<span class="kw">mov</span>     <span class="kw">dword ptr</span> <span class="mem">[rbp+<span class="num">0A0h</span>+<span class="const">var_100</span>+<span class="num">8</span>]</span>, <span class="num">0</span>
<span class="kw">mov</span>     <span class="kw">dword ptr</span> <span class="mem">[rbp+<span class="num">0A0h</span>+<span class="const">var_110</span>+<span class="num">0Ch</span>]</span>, <span class="num">0</span>
<span class="kw">mov</span>     <span class="kw">dword ptr</span> <span class="mem">[rbp+<span class="num">0A0h</span>+<span class="const">var_100</span>+<span class="num">0Ch</span>]</span>, <span class="num">3F800000h</span></div>
</div>

Then load arguments in registers:

<div>
<div class="asm64-code"><span class="kw">lea</span>     <span class="reg">rdx</span>, <span class="mem">[rsp+<span class="num">1A0h</span>+<span class="const">matrixPointer</span>]</span>
<span class="kw">lea</span>     <span class="reg">rcx</span>, <span class="mem">[rsp+<span class="num">1A0h</span>+<span class="const">CameraMatrix</span>]</span></div>
</div>

Call the function:

<div>
<div class="asm64-code"><span class="kw">call</span>    <span class="label">MatrixMultiply4x4</span></div>
</div>

Inside function entry (stack allocation, set up security cookies, load registers etc):

<div>
<div class="asm64-code"><span class="kw">sub</span>     <span class="reg">rsp</span>, <span class="num">78h</span>
<span class="kw">movaps</span>  <span class="mem">[rsp+<span class="num">78h</span>+<span class="const">var_18</span>]</span>, <span class="reg">xmm6</span>
<span class="kw">movaps</span>  <span class="mem">[rsp+<span class="num">78h</span>+<span class="const">var_28</span>]</span>, <span class="reg">xmm7</span>
<span class="kw">mov</span>     <span class="reg">rax</span>, <span class="const">cs:__security_cookie</span>
<span class="kw">xor</span>     <span class="reg">rax</span>, <span class="reg">rsp</span>
<span class="kw">mov</span>     <span class="mem">[rsp+<span class="num">78h</span>+<span class="const">var_38</span>]</span>, <span class="reg">rax</span>
<span class="kw">movaps</span>  <span class="reg">xmm4</span>, <span class="kw">xmmword ptr</span> <span class="mem">[rcx]</span>
<span class="kw">lea</span>     <span class="reg">r8</span>, <span class="mem">[rsp+<span class="num">78h</span>+<span class="const">var_78</span>]</span>
<span class="kw">movaps</span>  <span class="reg">xmm5</span>, <span class="kw">xmmword ptr</span> <span class="mem">[rcx+<span class="num">10h</span>]</span>
<span class="kw">lea</span>     <span class="reg">rax</span>, <span class="mem">[rsp+<span class="num">78h</span>+<span class="const">var_78</span>]</span>
<span class="kw">movaps</span>  <span class="reg">xmm6</span>, <span class="kw">xmmword ptr</span> <span class="mem">[rcx+<span class="num">20h</span>]</span>
<span class="kw">sub</span>     <span class="reg">rdx</span>, <span class="reg">r8</span>
<span class="kw">movaps</span>  <span class="reg">xmm7</span>, <span class="kw">xmmword ptr</span> <span class="mem">[rcx+<span class="num">30h</span>]</span>
<span class="kw">mov</span>     <span class="reg">r8d</span>, <span class="num">4</span>
<span class="kw">nop</span>     <span class="kw">dword ptr</span> <span class="mem">[rax]</span></div>
</div>

Multiply all rows to columns (repeated 4 times):

<div>
<div class="asm64-code"><span class="comment">; --- iteration 1 ---</span>
<span class="kw">movaps</span>  <span class="reg">xmm2</span>, <span class="kw">xmmword ptr</span> <span class="mem">[rdx+rax]</span>
<span class="kw">movaps</span>  <span class="reg">xmm3</span>, <span class="reg">xmm2</span>
<span class="kw">movaps</span>  <span class="reg">xmm0</span>, <span class="reg">xmm2</span>
<span class="kw">shufps</span>  <span class="reg">xmm3</span>, <span class="reg">xmm2</span>, <span class="num">55h</span> <span class="comment">; 'U'</span>
<span class="kw">movaps</span>  <span class="reg">xmm1</span>, <span class="reg">xmm2</span>
<span class="kw">shufps</span>  <span class="reg">xmm0</span>, <span class="reg">xmm2</span>, <span class="num">0</span>
<span class="kw">mulps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm5</span>
<span class="kw">shufps</span>  <span class="reg">xmm1</span>, <span class="reg">xmm2</span>, <span class="num">0AAh</span>
<span class="kw">mulps</span>   <span class="reg">xmm0</span>, <span class="reg">xmm4</span>
<span class="kw">mulps</span>   <span class="reg">xmm1</span>, <span class="reg">xmm6</span>
<span class="kw">shufps</span>  <span class="reg">xmm2</span>, <span class="reg">xmm2</span>, <span class="num">0FFh</span>
<span class="kw">addps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm0</span>
<span class="kw">mulps</span>   <span class="reg">xmm2</span>, <span class="reg">xmm7</span>
<span class="kw">addps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm1</span>
<span class="kw">addps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm2</span>
<span class="kw">movaps</span>  <span class="kw">xmmword ptr</span> <span class="mem">[rax]</span>, <span class="reg">xmm3</span>
<span class="kw">add</span>     <span class="reg">rax</span>, <span class="num">10h</span>
<span class="kw">sub</span>     <span class="reg">r8</span>, <span class="num">1</span>
<span class="kw">jnz</span>     <span class="label">short loc_712D0F0</span>
<span class="comment">; --- iteration 2 ---</span>
<span class="kw">movaps</span>  <span class="reg">xmm2</span>, <span class="kw">xmmword ptr</span> <span class="mem">[rdx+rax]</span>
<span class="kw">movaps</span>  <span class="reg">xmm3</span>, <span class="reg">xmm2</span>
<span class="kw">movaps</span>  <span class="reg">xmm0</span>, <span class="reg">xmm2</span>
<span class="kw">shufps</span>  <span class="reg">xmm3</span>, <span class="reg">xmm2</span>, <span class="num">55h</span> <span class="comment">; 'U'</span>
<span class="kw">movaps</span>  <span class="reg">xmm1</span>, <span class="reg">xmm2</span>
<span class="kw">shufps</span>  <span class="reg">xmm0</span>, <span class="reg">xmm2</span>, <span class="num">0</span>
<span class="kw">mulps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm5</span>
<span class="kw">shufps</span>  <span class="reg">xmm1</span>, <span class="reg">xmm2</span>, <span class="num">0AAh</span>
<span class="kw">mulps</span>   <span class="reg">xmm0</span>, <span class="reg">xmm4</span>
<span class="kw">mulps</span>   <span class="reg">xmm1</span>, <span class="reg">xmm6</span>
<span class="kw">shufps</span>  <span class="reg">xmm2</span>, <span class="reg">xmm2</span>, <span class="num">0FFh</span>
<span class="kw">addps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm0</span>
<span class="kw">mulps</span>   <span class="reg">xmm2</span>, <span class="reg">xmm7</span>
<span class="kw">addps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm1</span>
<span class="kw">addps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm2</span>
<span class="kw">movaps</span>  <span class="kw">xmmword ptr</span> <span class="mem">[rax]</span>, <span class="reg">xmm3</span>
<span class="kw">add</span>     <span class="reg">rax</span>, <span class="num">10h</span>
<span class="kw">sub</span>     <span class="reg">r8</span>, <span class="num">1</span>
<span class="kw">jnz</span>     <span class="label">short loc_712D0F0</span>
<span class="comment">; --- iteration 3 ---</span>
<span class="kw">movaps</span>  <span class="reg">xmm2</span>, <span class="kw">xmmword ptr</span> <span class="mem">[rdx+rax]</span>
<span class="kw">movaps</span>  <span class="reg">xmm3</span>, <span class="reg">xmm2</span>
<span class="kw">movaps</span>  <span class="reg">xmm0</span>, <span class="reg">xmm2</span>
<span class="kw">shufps</span>  <span class="reg">xmm3</span>, <span class="reg">xmm2</span>, <span class="num">55h</span> <span class="comment">; 'U'</span>
<span class="kw">movaps</span>  <span class="reg">xmm1</span>, <span class="reg">xmm2</span>
<span class="kw">shufps</span>  <span class="reg">xmm0</span>, <span class="reg">xmm2</span>, <span class="num">0</span>
<span class="kw">mulps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm5</span>
<span class="kw">shufps</span>  <span class="reg">xmm1</span>, <span class="reg">xmm2</span>, <span class="num">0AAh</span>
<span class="kw">mulps</span>   <span class="reg">xmm0</span>, <span class="reg">xmm4</span>
<span class="kw">mulps</span>   <span class="reg">xmm1</span>, <span class="reg">xmm6</span>
<span class="kw">shufps</span>  <span class="reg">xmm2</span>, <span class="reg">xmm2</span>, <span class="num">0FFh</span>
<span class="kw">addps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm0</span>
<span class="kw">mulps</span>   <span class="reg">xmm2</span>, <span class="reg">xmm7</span>
<span class="kw">addps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm1</span>
<span class="kw">addps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm2</span>
<span class="kw">movaps</span>  <span class="kw">xmmword ptr</span> <span class="mem">[rax]</span>, <span class="reg">xmm3</span>
<span class="kw">add</span>     <span class="reg">rax</span>, <span class="num">10h</span>
<span class="kw">sub</span>     <span class="reg">r8</span>, <span class="num">1</span>
<span class="kw">jnz</span>     <span class="label">short loc_712D0F0</span>
<span class="comment">; --- iteration 4 ---</span>
<span class="kw">movaps</span>  <span class="reg">xmm2</span>, <span class="kw">xmmword ptr</span> <span class="mem">[rdx+rax]</span>
<span class="kw">movaps</span>  <span class="reg">xmm3</span>, <span class="reg">xmm2</span>
<span class="kw">movaps</span>  <span class="reg">xmm0</span>, <span class="reg">xmm2</span>
<span class="kw">shufps</span>  <span class="reg">xmm3</span>, <span class="reg">xmm2</span>, <span class="num">55h</span> <span class="comment">; 'U'</span>
<span class="kw">movaps</span>  <span class="reg">xmm1</span>, <span class="reg">xmm2</span>
<span class="kw">shufps</span>  <span class="reg">xmm0</span>, <span class="reg">xmm2</span>, <span class="num">0</span>
<span class="kw">mulps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm5</span>
<span class="kw">shufps</span>  <span class="reg">xmm1</span>, <span class="reg">xmm2</span>, <span class="num">0AAh</span>
<span class="kw">mulps</span>   <span class="reg">xmm0</span>, <span class="reg">xmm4</span>
<span class="kw">mulps</span>   <span class="reg">xmm1</span>, <span class="reg">xmm6</span>
<span class="kw">shufps</span>  <span class="reg">xmm2</span>, <span class="reg">xmm2</span>, <span class="num">0FFh</span>
<span class="kw">addps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm0</span>
<span class="kw">mulps</span>   <span class="reg">xmm2</span>, <span class="reg">xmm7</span>
<span class="kw">addps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm1</span>
<span class="kw">addps</span>   <span class="reg">xmm3</span>, <span class="reg">xmm2</span>
<span class="kw">movaps</span>  <span class="kw">xmmword ptr</span> <span class="mem">[rax]</span>, <span class="reg">xmm3</span>
<span class="kw">add</span>     <span class="reg">rax</span>, <span class="num">10h</span>
<span class="kw">sub</span>     <span class="reg">r8</span>, <span class="num">1</span>
<span class="kw">jnz</span>     <span class="label">short loc_712D0F0</span></div>
</div>

Function end (dealloc stack, verify the security cookie, load result into memory and registers):

<div>
<div class="asm64-code"><span class="kw">add</span>     <span class="reg">rax</span>, <span class="num">10h</span>
<span class="kw">sub</span>     <span class="reg">r8</span>, <span class="num">1</span>
<span class="kw">jnz</span>     <span class="label">short loc_712D0F0</span>
<span class="kw">movaps</span>  <span class="reg">xmm0</span>, <span class="mem">[rsp+<span class="num">78h</span>+<span class="const">var_78</span>]</span>
<span class="kw">mov</span>     <span class="reg">rax</span>, <span class="reg">rcx</span>
<span class="kw">movaps</span>  <span class="reg">xmm1</span>, <span class="mem">[rsp+<span class="num">78h</span>+<span class="const">var_68</span>]</span>
<span class="kw">movaps</span>  <span class="kw">xmmword ptr</span> <span class="mem">[rcx]</span>, <span class="reg">xmm0</span>
<span class="kw">movaps</span>  <span class="reg">xmm0</span>, <span class="mem">[rsp+<span class="num">78h</span>+<span class="const">var_58</span>]</span>
<span class="kw">movaps</span>  <span class="kw">xmmword ptr</span> <span class="mem">[rcx+<span class="num">10h</span>]</span>, <span class="reg">xmm1</span>
<span class="kw">movaps</span>  <span class="reg">xmm1</span>, <span class="mem">[rsp+<span class="num">78h</span>+<span class="const">var_48</span>]</span>
<span class="kw">movaps</span>  <span class="kw">xmmword ptr</span> <span class="mem">[rcx+<span class="num">20h</span>]</span>, <span class="reg">xmm0</span>
<span class="kw">movaps</span>  <span class="kw">xmmword ptr</span> <span class="mem">[rcx+<span class="num">30h</span>]</span>, <span class="reg">xmm1</span>
<span class="kw">mov</span>     <span class="reg">rcx</span>, <span class="mem">[rsp+<span class="num">78h</span>+<span class="const">var_38</span>]</span>
<span class="kw">xor</span>     <span class="reg">rcx</span>, <span class="reg">rsp</span>        <span class="comment">; StackCookie</span>
<span class="kw">call</span>    <span class="label">j___security_check_cookie</span>
<span class="kw">movaps</span>  <span class="reg">xmm6</span>, <span class="mem">[rsp+<span class="num">78h</span>+<span class="const">var_18</span>]</span>
<span class="kw">movaps</span>  <span class="reg">xmm7</span>, <span class="mem">[rsp+<span class="num">78h</span>+<span class="const">var_28</span>]</span>
<span class="kw">add</span>     <span class="reg">rsp</span>, <span class="num">78h</span>
<span class="kw">retn</span></div>
</div>

</div>

<p>All this for a very simple coordinate space conversion?</p>

<p>We could simply shuffle it ourself using movaps to swizzle the vectors then simply use xorps with mask 0x80000000 for negation! Let’s try:</p>

<p><img src="/Redundancy-seen-in-AAA-game-engines/assets/images/part-1/camStructCopy.png" alt="ESP-Image1" /></p>

<p>It seems to be storing the result of the multiply into the camera structure, we could simply change how they store the rows in the camera structure. The multiplied result will not be used by the current function and will be simply deallocated as it’s constructed on the stack so we only care about what’s in the camera structure.</p>

<p>So now take the up vector negate it and put it on row2, then take forward vector and put it on row1.</p>

<blockquote>
  <p>Counting rows from 0!</p>
</blockquote>

<p>For this to work we simply change <code class="language-plaintext highlighter-rouge">v16 = CamMatRow1;</code> to <code class="language-plaintext highlighter-rouge">v16 = CamMatRow2;</code></p>

<p>and</p>

<p><code class="language-plaintext highlighter-rouge">v17 = CamMatRow2;</code> to <code class="language-plaintext highlighter-rouge">v17 = CamMatRow1;</code></p>

<blockquote>
  <p>This simple change would not add more assembly instructions, we are simply modifying existing instructions.</p>
</blockquote>

<p>then simply xorps v17, 0x80000000 to negate the bits.</p>

<p>And we are done!</p>

<p>So the total difference in instruction count is 133 to 1! where we only added one new instruction:</p>

<pre><code class="language-asm">xorps v17, 0x80000000
</code></pre>

<p>while completely nuking all other instructions i have shown above.</p>

<blockquote>
  <p>To be clear: the instructions moving the vectors into the camera structure aren’t redundant. They are a sunk cost that has to execute regardless, thus i did not include those instructions in the assembly showcase. The only net-new instruction required to achieve the coordinate conversion is a single xorps, thus i am calling it a 133 to 1 instruction count decrease.</p>
</blockquote>

<p>Instruction Count Visualized:</p>

<style>
  .asm-comparison-container {
    display: flex;
    flex-wrap: wrap;
    gap: 20px;
    margin: 30px 0;
    align-items: flex-start;
  }
  .asm-box {
    flex: 1 1 300px;
    background: #1e1e1e;
    border-radius: 8px;
    border: 1px solid #333;
    overflow: hidden;
    display: flex;
    flex-direction: column;
  }
  .asm-header {
    background: #2d2d2d;
    padding: 10px 15px;
    font-weight: bold;
    font-size: 0.9em;
    color: #e0e0e0;
    border-bottom: 1px solid #333;
    display: flex;
    justify-content: space-between;
  }
  .asm-bloat-header { border-top: 3px solid #ff4a4a; }
  .asm-opt-header { border-top: 3px solid #4aff8f; }
  .asm-code-content {
    padding: 15px;
    margin: 0;
    overflow-y: auto;
    font-family: Consolas, Monaco, 'Andale Mono', 'Ubuntu Mono', monospace;
    font-size: 0.85em;
    color: #d4d4d4;
  }
  .bloat-code { max-height: 400px; }
  .opt-code {}
  .asm-comment { color: #6a9955; }
  .asm-instr { color: #569cd6; }
  .asm-reg { color: #9cdcfe; }
</style>

<div class="asm-comparison-container">
  
  <div class="asm-box">
    <div class="asm-header asm-bloat-header">
      <span>Before</span>
      <span>133 Instructions</span>
    </div>
    <pre class="asm-code-content bloat-code"><code><span class="asm-comment">; 1. Load Camera Matrix</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm0</span>, xmmword ptr [rdi]
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm1</span>, xmmword ptr [rbx+0C0h]
<span class="asm-instr">movaps</span>  [rsp+1A0h+CameraMatrix], <span class="asm-reg">xmm0</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm0</span>, xmmword ptr [rbx+0D0h]
<span class="asm-instr">movaps</span>  [rsp+1A0h+CamMatRow1], <span class="asm-reg">xmm1</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm1</span>, xmmword ptr [rbx+0E0h]
<span class="asm-instr">movaps</span>  [rsp+1A0h+CamMatRow2], <span class="asm-reg">xmm0</span>
<span class="asm-instr">movaps</span>  [rsp+1A0h+CamMatRow3], <span class="asm-reg">xmm1</span>

<span class="asm-comment">; 2. Construct Identity/Swizzle Matrix</span>
<span class="asm-instr">mov</span>     qword ptr [rsp+1A0h+matrixPointer], 3F800000h
<span class="asm-instr">mov</span>     dword ptr [rbp+0A0h+var_120], 0
<span class="asm-instr">mov</span>     dword ptr [rbp+0A0h+var_110], 0
<span class="asm-instr">mov</span>     qword ptr [rbp+0A0h+var_100], 0
<span class="asm-instr">mov</span>     dword ptr [rbp+0A0h+var_120+4], 0B33BBD2Eh
<span class="asm-instr">mov</span>     dword ptr [rbp+0A0h+var_110+4], 0BF800000h
<span class="asm-instr">mov</span>     qword ptr [rsp+1A0h+matrixPointer+8], 0
<span class="asm-instr">mov</span>     qword ptr [rbp+0A0h+var_120+8], 3F800000h
<span class="asm-instr">mov</span>     dword ptr [rbp+0A0h+var_110+8], 0B33BBD2Eh
<span class="asm-instr">mov</span>     dword ptr [rbp+0A0h+var_100+8], 0
<span class="asm-instr">mov</span>     dword ptr [rbp+0A0h+var_110+0Ch], 0
<span class="asm-instr">mov</span>     dword ptr [rbp+0A0h+var_100+0Ch], 3F800000h

<span class="asm-comment">; 3. Setup Arguments &amp; Call MatrixMultiply4x4</span>
<span class="asm-instr">lea</span>     <span class="asm-reg">rdx</span>, [rsp+1A0h+matrixPointer]
<span class="asm-instr">lea</span>     <span class="asm-reg">rcx</span>, [rsp+1A0h+CameraMatrix]
<span class="asm-instr">call</span>    MatrixMultiply4x4

<span class="asm-comment">; 4. Inside Function: ABI Overhead</span>
<span class="asm-instr">sub</span>     <span class="asm-reg">rsp</span>, 78h
<span class="asm-instr">movaps</span>  [rsp+78h+var_18], <span class="asm-reg">xmm6</span>
<span class="asm-instr">movaps</span>  [rsp+78h+var_28], <span class="asm-reg">xmm7</span>
<span class="asm-instr">mov</span>     <span class="asm-reg">rax</span>, cs:__security_cookie
<span class="asm-instr">xor</span>     <span class="asm-reg">rax</span>, rsp
<span class="asm-instr">mov</span>     [rsp+78h+var_38], <span class="asm-reg">rax</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm4</span>, xmmword ptr [rcx]
<span class="asm-instr">lea</span>     <span class="asm-reg">r8</span>, [rsp+78h+var_78]
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm5</span>, xmmword ptr [rcx+10h]
<span class="asm-instr">lea</span>     <span class="asm-reg">rax</span>, [rsp+78h+var_78]
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm6</span>, xmmword ptr [rcx+20h]
<span class="asm-instr">sub</span>     <span class="asm-reg">rdx</span>, <span class="asm-reg">r8</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm7</span>, xmmword ptr [rcx+30h]
<span class="asm-instr">mov</span>     <span class="asm-reg">r8d</span>, 4

<span class="asm-comment">; 5. SIMD Unrolled Loop (x4 Iterations)</span>
<span class="asm-comment">; --- iteration 1 ---</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm2</span>, xmmword ptr [rdx+rax]
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm0</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm2</span>, 55h
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm1</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm0</span>, <span class="asm-reg">xmm2</span>, 0
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm5</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm1</span>, <span class="asm-reg">xmm2</span>, 0AAh
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm0</span>, <span class="asm-reg">xmm4</span>
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm1</span>, <span class="asm-reg">xmm6</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm2</span>, <span class="asm-reg">xmm2</span>, 0FFh
<span class="asm-instr">addps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm0</span>
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm2</span>, <span class="asm-reg">xmm7</span>
<span class="asm-instr">addps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm1</span>
<span class="asm-instr">addps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">movaps</span>  xmmword ptr [rax], <span class="asm-reg">xmm3</span>
<span class="asm-instr">add</span>     <span class="asm-reg">rax</span>, 10h
<span class="asm-instr">sub</span>     <span class="asm-reg">r8</span>, 1
<span class="asm-instr">jnz</span>     short loc_712D0F0
<span class="asm-comment">; --- iteration 2 ---</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm2</span>, xmmword ptr [rdx+rax]
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm0</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm2</span>, 55h
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm1</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm0</span>, <span class="asm-reg">xmm2</span>, 0
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm5</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm1</span>, <span class="asm-reg">xmm2</span>, 0AAh
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm0</span>, <span class="asm-reg">xmm4</span>
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm1</span>, <span class="asm-reg">xmm6</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm2</span>, <span class="asm-reg">xmm2</span>, 0FFh
<span class="asm-instr">addps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm0</span>
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm2</span>, <span class="asm-reg">xmm7</span>
<span class="asm-instr">addps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm1</span>
<span class="asm-instr">addps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">movaps</span>  xmmword ptr [rax], <span class="asm-reg">xmm3</span>
<span class="asm-instr">add</span>     <span class="asm-reg">rax</span>, 10h
<span class="asm-instr">sub</span>     <span class="asm-reg">r8</span>, 1
<span class="asm-instr">jnz</span>     short loc_712D0F0
<span class="asm-comment">; --- iteration 3 ---</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm2</span>, xmmword ptr [rdx+rax]
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm0</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm2</span>, 55h
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm1</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm0</span>, <span class="asm-reg">xmm2</span>, 0
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm5</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm1</span>, <span class="asm-reg">xmm2</span>, 0AAh
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm0</span>, <span class="asm-reg">xmm4</span>
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm1</span>, <span class="asm-reg">xmm6</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm2</span>, <span class="asm-reg">xmm2</span>, 0FFh
<span class="asm-instr">addps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm0</span>
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm2</span>, <span class="asm-reg">xmm7</span>
<span class="asm-instr">addps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm1</span>
<span class="asm-instr">addps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">movaps</span>  xmmword ptr [rax], <span class="asm-reg">xmm3</span>
<span class="asm-instr">add</span>     <span class="asm-reg">rax</span>, 10h
<span class="asm-instr">sub</span>     <span class="asm-reg">r8</span>, 1
<span class="asm-instr">jnz</span>     short loc_712D0F0
<span class="asm-comment">; --- iteration 4 ---</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm2</span>, xmmword ptr [rdx+rax]
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm0</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm2</span>, 55h
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm1</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm0</span>, <span class="asm-reg">xmm2</span>, 0
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm5</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm1</span>, <span class="asm-reg">xmm2</span>, 0AAh
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm0</span>, <span class="asm-reg">xmm4</span>
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm1</span>, <span class="asm-reg">xmm6</span>
<span class="asm-instr">shufps</span>  <span class="asm-reg">xmm2</span>, <span class="asm-reg">xmm2</span>, 0FFh
<span class="asm-instr">addps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm0</span>
<span class="asm-instr">mulps</span>   <span class="asm-reg">xmm2</span>, <span class="asm-reg">xmm7</span>
<span class="asm-instr">addps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm1</span>
<span class="asm-instr">addps</span>   <span class="asm-reg">xmm3</span>, <span class="asm-reg">xmm2</span>
<span class="asm-instr">movaps</span>  xmmword ptr [rax], <span class="asm-reg">xmm3</span>
<span class="asm-instr">add</span>     <span class="asm-reg">rax</span>, 10h
<span class="asm-instr">sub</span>     <span class="asm-reg">r8</span>, 1
<span class="asm-instr">jnz</span>     short loc_712D0F0

<span class="asm-comment">; 6. Deallocate &amp; Return</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm0</span>, [rsp+78h+var_78]
<span class="asm-instr">mov</span>     <span class="asm-reg">rax</span>, <span class="asm-reg">rcx</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm1</span>, [rsp+78h+var_68]
<span class="asm-instr">movaps</span>  xmmword ptr [rcx], <span class="asm-reg">xmm0</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm0</span>, [rsp+78h+var_58]
<span class="asm-instr">movaps</span>  xmmword ptr [rcx+10h], <span class="asm-reg">xmm1</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm1</span>, [rsp+78h+var_48]
<span class="asm-instr">movaps</span>  xmmword ptr [rcx+20h], <span class="asm-reg">xmm0</span>
<span class="asm-instr">movaps</span>  xmmword ptr [rcx+30h], <span class="asm-reg">xmm1</span>
<span class="asm-instr">mov</span>     <span class="asm-reg">rcx</span>, [rsp+78h+var_38]
<span class="asm-instr">xor</span>     <span class="asm-reg">rcx</span>, <span class="asm-reg">rsp</span>
<span class="asm-instr">call</span>    j___security_check_cookie
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm6</span>, [rsp+78h+var_18]
<span class="asm-instr">movaps</span>  <span class="asm-reg">xmm7</span>, [rsp+78h+var_28]
<span class="asm-instr">add</span>     <span class="asm-reg">rsp</span>, 78h
<span class="asm-instr">retn</span></code></pre>
  </div>

  <div class="asm-box">
    <div class="asm-header asm-opt-header">
      <span>After</span>
      <span>3 Instructions</span>
    </div>
    <pre class="asm-code-content opt-code"><code><span class="asm-comment">; Just swap the pointers and flip the sign bit</span>
<span class="asm-instr">movaps</span>  <span class="asm-reg">v16</span>, CamMatRow2
<span class="asm-instr">movaps</span>  <span class="asm-reg">v17</span>, CamMatRow1
<span class="asm-instr">xorps</span>   <span class="asm-reg">v17</span>, 0x80000000</code></pre>
  </div>

</div>

<p>You might say it’s for readability or that it makes it easier to modify the coordinate conversion later. But for a programmer who grasps the underlying math, the intent behind swapping rows and flipping a sign is perfectly clear especially since we can express the exact same logic cleanly using SSE intrinsics in C++. Memorizing textbook formulas is fine, but if you don’t understand the actual spatial intent behind them you’re just pattern-matching</p>

<h3 id="case-2-atanf-just-to-get-back-fovx">Case 2: Atanf just to get back FovX:</h3>

<p>I’ve already talked about this here <a href="https://zero-irp.github.io/Proj-Blog/part-5.1-reversing-projection-matrix/#xscale-construction">Reversing The Prespective Projection Matrix (Part 5.1)</a> but want to shine light on this in more detail.</p>

<p><img src="/Redundancy-seen-in-AAA-game-engines/assets/images/part-1/atanfRedun.png" alt="ESP-Image1" /></p>

<p>The arguments for tanf and atanf are not given but i have read the assembly (which is loaded into xmm0 just before call) and written the arguments on the right.</p>

<p>Let’s start with this block</p>

<div class="ida-code"><span class="var">fovX_calc</span> = (<span class="type">__m128</span>)*(<span class="kw">unsigned</span> <span class="kw">int</span> *)(<span class="var">a1</span> + <span class="num">0x234</span>);
<span class="var">fovX_calc</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] = <span class="var">fovX_calc</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="num">0.5</span>;
<span class="fn">ucrtBase_Tanf</span>();    <span class="comment">// Arg: fovX_calc</span>
</div>

<p>Using CE for dynamic analysis we can see that (a1 + 0x234) is FovX in radians</p>

<p>So it basically loads FovX in radians into “fovX_calc” then immediately divide it by 2 so now “fovX_calc” holds the value fovX/2.</p>

<p>Next <code class="language-plaintext highlighter-rouge">Tanf()</code> is called with <code class="language-plaintext highlighter-rouge">arg = fovX_calc</code> so fovX_calc currently holds the value <code class="language-plaintext highlighter-rouge">tan(fovX/2)</code></p>

<p>Next the engine does a lot of calculations using <code class="language-plaintext highlighter-rouge">tan(fovX/2)</code>, then the engine decides it actually needs the value of fovX back so it does it in an ingenious way!</p>

<div class="ida-code"><span class="fn">ucrtBase_aTanf</span>();    <span class="comment">// Arg: fovX_calc</span>
<span class="var">fovX_val</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] = <span class="var">fovX_calc</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="num">2.0</span>;
</div>

<blockquote>
  <p>The fovX_calc variable is never used again in the function</p>
</blockquote>

<p>we all know the identity arctan(tan(x)) = x holds if and only if x lies strictly inside (−90°, 90°), the identity holds without exception. FovX can only have a value from 60 to 120 degrees in-game. Since we divided it by 2, the angle is between 30 and 60 degrees (well within the -π/2 to π/2 principal bounds of arctan)</p>

<p>The codebase culture accepts:<br />
\(x = 2 \cdot \arctan\!\left(\tan\!\left(\frac{x}{2}\right)\right)\)<br />
as a valid way to move data from point A to point B.</p>

<p>So just to get back the raw fovX they call <code class="language-plaintext highlighter-rouge">atanf</code> then multiply it by 2. It could have been easily avoided by simply loading it from the camera structure again without it taking anywhere near a full IEEE 754 atanf calculation cycles since it’s certainly loaded in the L1 cache or alternatively just storing it to another unused xmm register to save it</p>

<blockquote>
  <p>Keep in mind this is not using the formula to find FovY which is:<br />
\(FOV_Y = 2 \cdot \tan^{-1}\!\left(\frac{\tan\!\left(\frac{FOV_X}{2}\right)}{A}\right)\)<br />
There is no Aspect Ratio used here.</p>
</blockquote>

<p>We cant exactly tell how many cycles atanf() has used. It can range from 30 to 150 depending on your CPU architecture.</p>

<p>One meaningless atanf() on a function that only runs a few times per frame? That’s negligible.</p>

<p>Multiple redundant atanf() calls scattered across the codebase? That’s adding up.</p>

<p>Multiple redundant atanf, tanf, sinf, cosf,  matrix multiply, matrix inversions, dot products, cross products etc etc..? That’s huge.</p>

<h3 id="the-point">The Point:</h3>

<p>The point here isn’t, “OmG tHeY uSeD a FuLl iEeE 754 bIt PeRfEct aTanf() CaLL jUsT tO gEt BaCk fovX!”</p>

<p>Let’s be real: doing that in a function that only runs a few times per frame doesn’t actually cost much, whether it’s an atanf call or a MatrixMultiply4x4.</p>

<p><strong>The real point is: “Does it stop here?”, “Does this practice not carry over to all other systems?”</strong></p>

<p>It does. This same over-engineering and “clean C++” bleeds into entirely different generic functions across the codebase. We’re talking atanf, tanf, sinf, cosf, 1/sqrtf, normalization, matrix multiply, matrix inversions, dot products, cross products, and honestly a whole, whole lot more.</p>

<p>And it’s probably not just math libraries being used like this.</p>

<p>This is the exact definition of “Death by a Thousand Cuts”. The Codebase ensures you’re bleeding from everywhere.</p>

<p>Now that you know the point of this blog i will continue with Ghost of Tsushima and the Avalanche Engine.</p>

<div class="post-nav">
  <a class="next-only" href="/Redundancy-seen-in-AAA-game-engines/part-2-Ghost-of-tsushima/">Part 2: Ghost Of Tsushima &raquo;</a>
</div>]]></content><author><name>z1rp</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Part 2: Ghost Of Tsushima (Vector Extraction Through Multiplication)</title><link href="https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/part-2-Ghost-of-tsushima/" rel="alternate" type="text/html" title="Part 2: Ghost Of Tsushima (Vector Extraction Through Multiplication)" /><published>2026-05-25T18:30:00+00:00</published><updated>2026-05-25T18:30:00+00:00</updated><id>https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/part-2-Ghost-of-tsushima</id><content type="html" xml:base="https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/part-2-Ghost-of-tsushima/"><![CDATA[<style>
.post-nav {
  display: flex;
  justify-content: space-between;
  margin-top: 40px;
  padding-top: 20px;
  border-top: 1px solid #444;
  font-family: Consolas, "Liberation Mono", Menlo, monospace;
  font-size: 15px;
}
.post-nav a {
  color: #569cd6;
  text-decoration: none;
  padding: 10px 16px;
  background: #1e1e1e;
  border-radius: 6px;
  transition: background 0.2s ease;
}
.post-nav a:hover {
  background: #2d2d2d;
}
</style>

<style>
.ida-code{
  background:#1e1e1e;
  color:#dcdcdc;
  padding:12px;
  border-radius:8px;
  font-family: Consolas, "Liberation Mono", Menlo, monospace;
  font-size:14px;
  line-height:1.4;
  overflow-x:auto;
  white-space: pre; /* preserve spaces and linebreaks */
}

/* token classes you can use inside the div */
.ida-code .kw    { color:#569cd6; } /* keywords */
.ida-code .type  { color:#4ec9b0; } /* types */
.ida-code .fn    { color:#dcdcaa; } /* functions / intrinsics */
.ida-code .num   { color:#b5cea8; } /* numbers / hex */
.ida-code .var   { color:#9cdcfe; } /* variables */
.ida-code .const { color:#ce9178; } /* globals / constants */
.ida-code .comment{ color:#6a9955; font-style:italic; }
</style>

<h3 id="case-1-vector-extraction-through-multiplication">Case 1: Vector Extraction Through Multiplication?</h3>

<p>In Ghost of Tsushima while i was looking at how the View-Projection Matrix was being constructed i came across a common pattern where they do a full Row to column multiplication that could be replaced by a simple <code class="language-plaintext highlighter-rouge">movaps</code> instruction.</p>

<p>Take this Example (IDA pseudo code):</p>

<div class="ida-code"> <span class="var">v16</span>[<span class="num">3</span>] = <span class="fn">_mm_add_ps</span>(
             <span class="fn">_mm_add_ps</span>(
               <span class="fn">_mm_add_ps</span>(
		 <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>((<span class="type">__m128</span>)<span class="var">xmmword_1138D10</span>, (<span class="type">__m128</span>)<span class="var">xmmword_1138D10</span>, <span class="num">0</span>), <span class="var">ProjMat_0</span>), 
		 <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>((<span class="type">__m128</span>)<span class="var">xmmword_1138D10</span>, (<span class="type">__m128</span>)<span class="var">xmmword_1138D10</span>, <span class="num">0x55</span>), <span class="var">ProjMat_1</span>)),
               <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>((<span class="type">__m128</span>)<span class="var">xmmword_1138D10</span>, (<span class="type">__m128</span>)<span class="var">xmmword_1138D10</span>, <span class="num">0xAA</span>), <span class="var">ProjMat_2</span>)),
             <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>((<span class="type">__m128</span>)<span class="var">xmmword_1138D10</span>, (<span class="type">__m128</span>)<span class="var">xmmword_1138D10</span>, <span class="num">0xFF</span>), <span class="var">ProjMat_3</span>));
</div>

<blockquote>
  <p>This is part of the construction of the View-Projection Matrix where the translation row of the matrix needed to be zeroed out (likely for skybox rendering).
Multiply View with Projection only using directional vectors while zeroing out Translation.</p>
</blockquote>

<p>The logic here is simply:</p>

<p><strong>Step 1: Shuffle</strong></p>

<ul>
  <li>_mm_shuffle_ps(someRow1, someRow1, imm) selects one component of someRow1 and replicates it across all 4 slots of a new __m128.</li>
  <li>The different imm values pick different elements:
  0x00-&gt; picks element 0 (X)
  0x55-&gt; picks element 1 (Y)
  0xAA-&gt; picks element 2 (Z)
  0xFF-&gt; picks element 3 (W)</li>
</ul>

<p>After shuffling, each __m128 looks like [X,X,X,X], [Y,Y,Y,Y], etc.</p>

<p><strong>Step 2: Multiply with Projection Matrix</strong></p>

<ul>
  <li>Each shuffled vector is multiplied component-wise with a column of the projection matrix:</li>
</ul>

<div class="ida-code"><span class="fn">_mm_mul_ps</span>(<span class="var">shuffledRow</span>, <span class="var">ProjMat_n</span>)
</div>

<ul>
  <li>This performs 4 parallel multiplications of the same row component with each element in the projection matrix column.</li>
</ul>

<p><strong>Step 3: Sum the results</strong></p>

<ul>
  <li>The _mm_add_ps calls sum all four products together:</li>
</ul>

<div class="ida-code">(<span class="var">X</span> * <span class="var">ProjMat_0</span>) + (<span class="var">Y</span> * <span class="var">ProjMat_1</span>) + (<span class="var">Z</span> * <span class="var">ProjMat_2</span>) + (<span class="var">W</span> * <span class="var">ProjMat_3</span>)</div>

<ul>
  <li>The result is a single row of the final View-Projection matrix.</li>
</ul>

<h5 id="the-problem">The Problem:</h5>

<p>The problem here is that <code class="language-plaintext highlighter-rouge">xmmword_1138D10</code> has a value of: (0.0, 0.0, 0.0, 1.0).</p>

<p>Since the first three components are zero, those multiplications with ProjMat_0, ProjMat_1, and ProjMat_2 drop out. The only one left is the last one, where w = 1.0. Which means you’re just selecting the last row of the projection matrix (ProjMat_3).</p>

<p>So this whole instruction chain simplifies to basically 1 <code class="language-plaintext highlighter-rouge">movaps</code> instruction:</p>

<div class="ida-code"> <span class="var">v16</span>[<span class="num">3</span>] = <span class="var">ProjMat_3</span></div>

<h3 id="case-2-even-more-vector-extraction-through-multiplication">Case 2: Even More Vector Extraction Through Multiplication??</h3>

<p><img src="/Redundancy-seen-in-AAA-game-engines/assets/images/part-2/ida_view_matrix_math.png" alt="ESP-Image1" /></p>

<p>This is the later stage where translation is added back into the View-Projection Matrix where it was previously zeroed out, and it is done in a very confusing way.</p>

<div class="ida-code"><span class="var">v17</span> = <span class="fn">_mm_add_ps</span>(
        <span class="fn">_mm_add_ps</span>(
          <span class="fn">_mm_add_ps</span>(
	    <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">CamPos_Negated_w1_dup</span>, <span class="var">CamPos_Negated_w1_dup</span>, <span class="num">0x55</span>), <span class="var">VP_NoTrans_Row1</span>), 
	    <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">CamPos_Negated_w1_dup</span>, <span class="var">CamPos_Negated_w1_dup</span>, <span class="num">0</span>), *<span class="var">VP_NoTrans</span>)),
          <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">CamPos_Negated_w1_dup</span>, <span class="var">CamPos_Negated_w1_dup</span>, <span class="num">0xAA</span>), <span class="var">VP_NoTrans_Row2</span>)),
        <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">CamPos_Negated_w1_dup</span>, <span class="var">CamPos_Negated_w1_dup</span>, <span class="num">0xFF</span>), <span class="var">VP_NoTrans_Row3</span>));
</div>

<p>This is the only Vector Multiplication that matters, the one where it’s adding back the translation into the VP matrix.</p>

<p>Here is the biggest reduction:</p>

<div class="ida-code"> <span class="var">v18</span> = <span class="fn">_mm_add_ps</span>(
	 <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">Mask_0100</span>, <span class="var">Mask_0100</span>, <span class="num">0x55</span>), <span class="var">VP_NoTrans_Row1</span>), 
	 <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">Mask_0100</span>, <span class="var">Mask_0100</span>, <span class="num">0</span>), *<span class="var">VP_NoTrans</span>));
  *(<span class="type">__m128</span> *)(<span class="var">a1</span> + <span class="num">0x260</span>) = <span class="fn">_mm_add_ps</span>(
                              <span class="fn">_mm_add_ps</span>(
                                <span class="fn">_mm_add_ps</span>(
                                  <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">Mask_0010</span>, <span class="var">Mask_0010</span>, <span class="num">0x55</span>), <span class="var">VP_NoTrans_Row1</span>),
                                  <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">Mask_0010</span>, <span class="var">Mask_0010</span>, <span class="num">0</span>), *<span class="var">VP_NoTrans</span>)),
                                <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">Mask_0010</span>, <span class="var">Mask_0010</span>, <span class="num">0xAA</span>), <span class="var">VP_NoTrans_Row2</span>)),
                              <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">Mask_0010</span>, <span class="var">Mask_0010</span>, <span class="num">0xFF</span>), <span class="var">VP_NoTrans_Row3</span>));
  *(<span class="type">__m128</span> *)(<span class="var">a1</span> + <span class="num">0x270</span>) = <span class="var">v17</span>;
  *(<span class="type">__m128</span> *)(<span class="var">a1</span> + <span class="num">0x250</span>) = <span class="fn">_mm_add_ps</span>(
                              <span class="fn">_mm_add_ps</span>(<span class="var">v18</span>, <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">Mask_0100</span>, <span class="var">Mask_0100</span>, <span class="num">0xAA</span>), <span class="var">VP_NoTrans_Row2</span>)),
                              <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">Mask_0100</span>, <span class="var">Mask_0100</span>, <span class="num">0xFF</span>), <span class="var">VP_NoTrans_Row3</span>));
  *(<span class="type">__m128</span> *)(<span class="var">a1</span> + <span class="num">0x240</span>) = <span class="fn">_mm_add_ps</span>(
                              <span class="fn">_mm_add_ps</span>(
                                <span class="fn">_mm_add_ps</span>(
                                  <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">Mask_1000</span>, <span class="var">Mask_1000</span>, <span class="num">0x55</span>), <span class="var">VP_NoTrans_Row1</span>),
                                  <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">Mask_1000</span>, <span class="var">Mask_1000</span>, <span class="num">0</span>), <span class="var">VP_NoTrans_Row0</span>)),
                                <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">Mask_1000</span>, <span class="var">Mask_1000</span>, <span class="num">0xAA</span>), <span class="var">VP_NoTrans_Row2</span>)),
                              <span class="fn">_mm_mul_ps</span>(<span class="fn">_mm_shuffle_ps</span>(<span class="var">Mask_1000</span>, <span class="var">Mask_1000</span>, <span class="num">0xFF</span>), <span class="var">VP_NoTrans_Row3</span>));

</div>

<p>This is just extracting the values stored in the VP rows using Masks.</p>

<p>Mask_1000 is (1, 0, 0, 0)
Mask_0100 is (0, 1, 0, 0)
Mask_0010 is (0, 0, 1, 0)</p>

<p>Multiplying a unit vector by a matrix simply extracts the corresponding row. The original code was laboriously performing this extraction manually for each axis:</p>

<ul>
  <li>The calculation for 0x240 used Mask_1000 to extract Row0.</li>
  <li>The calculation for 0x250 used Mask_0100 to extract Row1.</li>
  <li>The calculation for 0x260 used Mask_0010 to extract Row2.</li>
</ul>

<p>So it just collapses into 3 <code class="language-plaintext highlighter-rouge">movaps</code> instructions and 1 vector multiplication.</p>

<div class="ida-code">  *(<span class="type">__m128</span> *)(<span class="var">a1</span> + <span class="num">0x240</span>) = <span class="var">VP_NoTrans_Row0</span>
  *(<span class="type">__m128</span> *)(<span class="var">a1</span> + <span class="num">0x250</span>) = <span class="var">VP_NoTrans_Row1</span>
  *(<span class="type">__m128</span> *)(<span class="var">a1</span> + <span class="num">0x260</span>) = <span class="var">VP_NoTrans_Row2</span>
  *(<span class="type">__m128</span> *)(<span class="var">a1</span> + <span class="num">0x270</span>) = <span class="var">v17</span>
</div>

<p>Again: I was not looking specifically for over-engineered code, this just stood out a lot.</p>

<blockquote>
  <p>I have also optimized this on my previous blog in assembly (for fun):<br />
<a href="https://zero-irp.github.io/ViewProj-Blog/part-4.5-detour-hooking-simd-operations/">Reversing The ViewProjection Matrix - Part 4.5: Detour Hooking to Optimize SIMD Operations</a></p>
</blockquote>

<div class="post-nav">
  <a href="/Redundancy-seen-in-AAA-game-engines/part-1-dunia/">&laquo; Part 1: Dunia Engine</a>
  <a href="/Redundancy-seen-in-AAA-game-engines/part-3-Avalanche-engine/">Part 3: Avalanche Engine &raquo;</a>
</div>]]></content><author><name>z1rp</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Part 3: Avalanche Engine (Matrix Multiplication Replaces Vector Addition)</title><link href="https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/part-3-Avalanche-engine/" rel="alternate" type="text/html" title="Part 3: Avalanche Engine (Matrix Multiplication Replaces Vector Addition)" /><published>2026-05-25T18:30:00+00:00</published><updated>2026-05-25T18:30:00+00:00</updated><id>https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/part-3-Avalanche-engine</id><content type="html" xml:base="https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/part-3-Avalanche-engine/"><![CDATA[<style>
.post-nav {
  display: flex;
  justify-content: space-between;
  margin-top: 40px;
  padding-top: 20px;
  border-top: 1px solid #444;
  font-family: Consolas, "Liberation Mono", Menlo, monospace;
  font-size: 15px;
}
.post-nav a {
  color: #569cd6;
  text-decoration: none;
  padding: 10px 16px;
  background: #1e1e1e;
  border-radius: 6px;
  transition: background 0.2s ease;
}
.post-nav a:hover {
  background: #2d2d2d;
}
</style>

<style>
.ida-code{
  background:#1e1e1e;
  color:#dcdcdc;
  padding:12px;
  border-radius:8px;
  font-family: Consolas, "Liberation Mono", Menlo, monospace;
  font-size:14px;
  line-height:1.4;
  overflow-x:auto;
  white-space: pre; /* preserve spaces and linebreaks */
}

/* token classes you can use inside the div */
.ida-code .kw    { color:#569cd6; } /* keywords */
.ida-code .type  { color:#4ec9b0; } /* types */
.ida-code .fn    { color:#dcdcaa; } /* functions / intrinsics */
.ida-code .num   { color:#b5cea8; } /* numbers / hex */
.ida-code .var   { color:#9cdcfe; } /* variables */
.ida-code .const { color:#ce9178; } /* globals / constants */
.ida-code .comment{ color:#6a9955; font-style:italic; }
</style>

<style>
.asm64-wrap { display: flex; flex-direction: column; gap: 18px; padding: 4px 0; }
.asm64-section-label {
  font-size: 13px;
  color: var(--color-text-secondary);
  margin-bottom: 4px;
  font-family: var(--font-sans);
}
.asm64-code {
  background: #1e1e1e;
  color: #c5c8c6;
  padding: 12px;
  border-radius: 8px;
  font-family: "Cascadia Code", Consolas, "Liberation Mono", Menlo, monospace;
  font-size: 13px;
  line-height: 1.45;
  overflow-x: auto;
  white-space: pre;
  margin: 0;
}
.asm64-code .kw     { color: #d78700; font-weight: 600; }
.asm64-code .reg    { color: #5fafaf; }
.asm64-code .mem    { color: #af87d7; }
.asm64-code .num    { color: #b5cea8; }
.asm64-code .label  { color: #ffaf5f; }
.asm64-code .comment{ color: #6a9955; font-style: italic; }
.asm64-code .const  { color: #ce9178; }
</style>

<h3 id="temporal-anti-aliasing-taa-matrix-multiplication-replaces-vector-addition">Temporal Anti-Aliasing (TAA): Matrix Multiplication Replaces Vector Addition</h3>

<p><img src="/Redundancy-seen-in-AAA-game-engines/assets/images/part-3/jitter.png" alt="ESP-Image1" /></p>

<p>Here we see a very “Textbook” way of adding jitters to the projection matrix for <code class="language-plaintext highlighter-rouge">SMAA_T2X</code> in the Avalanche Engine.</p>

<p>The classic textbook way being:</p>

\[\begin{bmatrix}
x_{scale} &amp; 0 &amp; 0 &amp; 0 \\
0 &amp; y_{scale} &amp; 0 &amp; 0 \\
0 &amp; 0 &amp; \dfrac{z_{far}}{z_{far}-z_{near}} &amp; 1 \\
0 &amp; 0 &amp; -\dfrac{z_{near}z_{far}}{z_{far}-z_{near}} &amp; 0
\end{bmatrix}
\times
\begin{bmatrix}
1 &amp; 0 &amp; 0 &amp; 0 \\
0 &amp; 1 &amp; 0 &amp; 0 \\
0 &amp; 0 &amp; 1 &amp; 0 \\
j_x &amp; j_y &amp; 0 &amp; 1
\end{bmatrix}
=
\begin{bmatrix}
x_{scale} &amp; 0 &amp; 0 &amp; 0 \\
0 &amp; y_{scale} &amp; 0 &amp; 0 \\
j_x &amp; j_y &amp; \dfrac{z_{far}}{z_{far}-z_{near}} &amp; 1 \\
0 &amp; 0 &amp; -\dfrac{z_{near}z_{far}}{z_{far}-z_{near}} &amp; 0
\end{bmatrix}\]

<p>This looks Clean when looking at the source code, but in a low-level CPU render loop, it is inefficient.</p>

<p>Probably would look something like this in C++:</p>

<div class="ida-code"><span class="type">Matrix4x4</span> <span class="var">jitterMatrix</span> = <span class="var">Matrix4x4</span>::<span class="fn">Identity</span>();
<span class="var">jitterMatrix</span>.<span class="var">m</span>[<span class="num">3</span>][<span class="num">0</span>] = <span class="var">jX</span>;
<span class="var">jitterMatrix</span>.<span class="var">m</span>[<span class="num">3</span>][<span class="num">1</span>] = <span class="var">jY</span>;


<span class="var">projMatrix</span> = <span class="var">projMatrix</span> * <span class="var">jitterMatrix</span>;</div>

<p>Again: <strong>Clean C++ Code ≠ Clean Compiled Code</strong></p>

<ul>
  <li>First we need to construct an entire 4x4 Identity Matrix on the stack just to hold two float values.</li>
  <li>Then load the matrices as arguments into the function.</li>
  <li>Inside the function do stack allocation, set up security cookies, load registers etc.</li>
  <li>Multiply all rows to columns (repeated 4 times)</li>
  <li>Finally end the function by deallocating stack, verify the security cookie, loading result into memory and registers.</li>
</ul>

<blockquote>
  <p>Note: Calculating <code class="language-plaintext highlighter-rouge">curFrame &amp; 1</code>, selecting jitters, scaling them down to sub-pixel space are all mathematically necessary steps and are not over-engineered.</p>
</blockquote>

<p>The easier way to do it would be:</p>

<ul>
  <li>Take the scaled down jitters.</li>
  <li>Take the 2nd Row (counting from 0) of the Projection Matrix and do a very simple <code class="language-plaintext highlighter-rouge">addps</code>.</li>
</ul>

<p>Example (where the jitters were already scaled down):</p>

<div class="asm64-wrap">
  <pre class="asm64-code"><span class="kw">movaps</span> <span class="reg">xmm0</span>, <span class="const">projMat_2</span>  <span class="comment">; Load 2nd Row [0, 0, Z_scale, 1]</span>
<span class="kw">movaps</span> <span class="reg">xmm1</span>, <span class="const">jitter_Row</span> <span class="comment">; [jitX, jitY, 0, 0]</span>

<span class="kw">addps</span> <span class="reg">xmm0</span>, <span class="reg">xmm1</span> <span class="comment">; result: [jX, jY, Z_scale, 1]</span></pre>
</div>

<p>That’s about a 120 instruction count drop to 3.</p>

<div class="post-nav">
  <a href="/Redundancy-seen-in-AAA-game-engines/part-2-Ghost-of-tsushima/">&laquo; Part 2: Ghost Of Tsushima</a>
  <a href="/Redundancy-seen-in-AAA-game-engines/part-4-using-general-matrix-inverse-when-you-should-not/">Part 4: Using generic matrix inverses &raquo;</a>
</div>]]></content><author><name>z1rp</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Part 4: Using generic matrix inverse when you don’t need to</title><link href="https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/part-4-using-general-matrix-inverse-when-you-should-not/" rel="alternate" type="text/html" title="Part 4: Using generic matrix inverse when you don’t need to" /><published>2026-05-25T18:30:00+00:00</published><updated>2026-05-25T18:30:00+00:00</updated><id>https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/part-4-using-general-matrix-inverse-when-you-should-not</id><content type="html" xml:base="https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/part-4-using-general-matrix-inverse-when-you-should-not/"><![CDATA[<style>
.post-nav {
  display: flex;
  justify-content: space-between;
  margin-top: 40px;
  padding-top: 20px;
  border-top: 1px solid #444;
  font-family: Consolas, "Liberation Mono", Menlo, monospace;
  font-size: 15px;
}
.post-nav a {
  color: #569cd6;
  text-decoration: none;
  padding: 10px 16px;
  background: #1e1e1e;
  border-radius: 6px;
  transition: background 0.2s ease;
}
.post-nav a:hover {
  background: #2d2d2d;
}
</style>

<style>
.ida-code{
  background:#1e1e1e;
  color:#dcdcdc;
  padding:12px;
  border-radius:8px;
  font-family: Consolas, "Liberation Mono", Menlo, monospace;
  font-size:14px;
  line-height:1.4;
  overflow-x:auto;
  white-space: pre; /* preserve spaces and linebreaks */
}

/* token classes you can use inside the div */
.ida-code .kw    { color:#ff9e3b; } /* keywords (Bright Orange) -> Or use #f9d849 for Bright Yellow *
.ida-code .type  { color:#4ec9b0; } /* types */
.ida-code .fn    { color:#dcdcaa; } /* functions / intrinsics */
.ida-code .num   { color:#b5cea8; } /* numbers / hex */
.ida-code .var   { color:#9cdcfe; } /* variables */
.ida-code .const { color:#ce9178; } /* globals / constants */
.ida-code .comment{ color:#6a9955; font-style:italic; }
</style>

<p>This is a small tangent I’m going to go on: It’s about the amount of times I have seen a game engine using a generic matrix inverse function where it can be inlined to be way WAY faster!</p>

<p>Here is a generic matrix inverse using Cramer’s Rule:</p>

<div class="ida-code"><span class="type">_UNKNOWN</span> **<span class="kw">__fastcall</span> <span class="fn">sub_65C3E20</span>(<span class="type">__m128</span> *<span class="var">a1</span>)
{
  <span class="var">v2</span> = *<span class="var">a1</span>;
  <span class="var">v3</span> = <span class="var">a1</span>[<span class="num">1</span>];
  <span class="var">v4</span> = <span class="fn">_mm_shuffle_ps</span>(<span class="var">v2</span>, <span class="var">v2</span>, <span class="num">0xFF</span>).<span class="var">m128_f32</span>[<span class="num">0</span>];
  <span class="var">v5</span> = <span class="fn">_mm_shuffle_ps</span>(<span class="var">v3</span>, <span class="var">v3</span>, <span class="num">0xFF</span>).<span class="var">m128_f32</span>[<span class="num">0</span>];
  <span class="var">v40</span> = *<span class="var">a1</span>;
  <span class="var">v42</span> = <span class="var">v3</span>.<span class="var">m128_f32</span>[<span class="num">0</span>];
  <span class="var">v6</span> = <span class="var">a1</span>[<span class="num">2</span>];
  <span class="var">v7</span> = <span class="var">a1</span>[<span class="num">3</span>];
  <span class="var">v8</span> = <span class="fn">_mm_shuffle_ps</span>(<span class="var">v7</span>, <span class="var">v7</span>, <span class="num">0xFF</span>).<span class="var">m128_f32</span>[<span class="num">0</span>];
  <span class="var">v9</span> = <span class="fn">_mm_shuffle_ps</span>(<span class="var">v7</span>, <span class="var">v7</span>, <span class="num">0xAA</span>).<span class="var">m128_f32</span>[<span class="num">0</span>];
  <span class="var">v10</span> = <span class="fn">_mm_shuffle_ps</span>(<span class="var">v6</span>, <span class="var">v6</span>, <span class="num">0xFF</span>).<span class="var">m128_f32</span>[<span class="num">0</span>];
  <span class="var">v11</span> = <span class="fn">_mm_shuffle_ps</span>(<span class="var">v2</span>, <span class="var">v2</span>, <span class="num">0xAA</span>).<span class="var">m128_f32</span>[<span class="num">0</span>];
  <span class="var">v38</span> = <span class="var">v7</span>;
  <span class="var">v7</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] = <span class="fn">_mm_shuffle_ps</span>(<span class="var">v3</span>, <span class="var">v3</span>, <span class="num">0xAA</span>).<span class="var">m128_f32</span>[<span class="num">0</span>];
  <span class="var">v39</span> = <span class="fn">_mm_shuffle_ps</span>(<span class="var">v6</span>, <span class="var">v6</span>, <span class="num">0xAA</span>).<span class="var">m128_f32</span>[<span class="num">0</span>];
  <span class="var">v12</span> = <span class="var">v7</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v8</span>;
  <span class="var">v13</span> = <span class="var">v39</span> * <span class="var">v8</span>;
  <span class="var">v14</span> = <span class="var">v5</span> * <span class="var">v9</span>;
  <span class="var">v37</span> = <span class="var">v6</span>.<span class="var">m128_f32</span>[<span class="num">0</span>];
  <span class="var">v41</span> = <span class="fn">_mm_shuffle_ps</span>(<span class="var">v3</span>, <span class="var">v3</span>, <span class="num">0x55</span>).<span class="var">m128_f32</span>[<span class="num">0</span>];
  <span class="var">v46</span> = <span class="var">v8</span>;
  <span class="var">v15</span> = <span class="var">v11</span> * <span class="var">v8</span>;
  <span class="var">v44</span> = <span class="var">v11</span>;
  <span class="var">v16</span> = <span class="var">v4</span> * <span class="var">v9</span>;
  <span class="var">v48</span> = <span class="var">v9</span>;
  <span class="var">v17</span> = <span class="var">v10</span> * <span class="var">v9</span>;
  <span class="var">v18</span> = <span class="var">v11</span> * <span class="var">v5</span>;
  <span class="var">v49</span> = <span class="var">v7</span>.<span class="var">m128_f32</span>[<span class="num">0</span>];
  <span class="var">v7</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] = <span class="var">v7</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v10</span>;
  <span class="var">v43</span> = <span class="var">v10</span>;
  <span class="var">v19</span> = <span class="var">v11</span> * <span class="var">v10</span>;
  <span class="var">v20</span> = <span class="var">v4</span> * <span class="var">v39</span>;
  <span class="var">v47</span> = <span class="var">v4</span>;
  <span class="var">v21</span> = <span class="var">v4</span> * <span class="var">v49</span>;
  <span class="var">v31</span> = <span class="var">v12</span>;
  <span class="var">v45</span> = <span class="fn">_mm_shuffle_ps</span>(<span class="var">v6</span>, <span class="var">v6</span>, <span class="num">0x55</span>).<span class="var">m128_f32</span>[<span class="num">0</span>];
  <span class="var">v50</span> = <span class="fn">_mm_shuffle_ps</span>(<span class="var">v38</span>, <span class="var">v38</span>, <span class="num">0x55</span>).<span class="var">m128_f32</span>[<span class="num">0</span>];
  <span class="var">v30</span> = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v45</span> * <span class="var">v14</span>) + (<span class="kw">float</span>)(<span class="var">v41</span> * <span class="var">v13</span>)) + (<span class="kw">float</span>)(<span class="var">v50</span> * <span class="var">v7</span>.<span class="var">m128_f32</span>[<span class="num">0</span>]))
      - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v45</span> * <span class="var">v12</span>) + (<span class="kw">float</span>)(<span class="var">v41</span> * <span class="var">v17</span>)) + (<span class="kw">float</span>)(<span class="var">v50</span> * (<span class="kw">float</span>)(<span class="var">v5</span> * <span class="var">v39</span>)));
  <span class="var">v32</span> = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v45</span> * <span class="var">v15</span>) + (<span class="kw">float</span>)(<span class="var">v40</span>.<span class="var">m128_f32</span>[<span class="num">1</span>] * <span class="var">v17</span>)) + (<span class="kw">float</span>)(<span class="var">v50</span> * <span class="var">v20</span>))
      - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v45</span> * <span class="var">v16</span>) + (<span class="kw">float</span>)(<span class="var">v40</span>.<span class="var">m128_f32</span>[<span class="num">1</span>] * <span class="var">v13</span>)) + (<span class="kw">float</span>)(<span class="var">v50</span> * <span class="var">v19</span>));
  <span class="var">v33</span> = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v41</span> * <span class="var">v16</span>) + (<span class="kw">float</span>)(<span class="var">v40</span>.<span class="var">m128_f32</span>[<span class="num">1</span>] * <span class="var">v12</span>)) + (<span class="kw">float</span>)(<span class="var">v50</span> * <span class="var">v18</span>))
      - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v41</span> * <span class="var">v15</span>) + (<span class="kw">float</span>)(<span class="var">v40</span>.<span class="var">m128_f32</span>[<span class="num">1</span>] * <span class="var">v14</span>)) + (<span class="kw">float</span>)(<span class="var">v50</span> * <span class="var">v21</span>));
  <span class="var">v3</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v41</span> * <span class="var">v19</span>) + (<span class="kw">float</span>)(<span class="var">v40</span>.<span class="var">m128_f32</span>[<span class="num">1</span>] * (<span class="kw">float</span>)(<span class="var">v5</span> * <span class="var">v39</span>)))
                         + (<span class="kw">float</span>)(<span class="var">v45</span> * <span class="var">v21</span>))
                 - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v41</span> * <span class="var">v20</span>) + (<span class="kw">float</span>)(<span class="var">v40</span>.<span class="var">m128_f32</span>[<span class="num">1</span>] * <span class="var">v7</span>.<span class="var">m128_f32</span>[<span class="num">0</span>])) + (<span class="kw">float</span>)(<span class="var">v45</span> * <span class="var">v18</span>));
  <span class="var">v22</span> = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v6</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v12</span>) + (<span class="kw">float</span>)(<span class="var">v42</span> * <span class="var">v17</span>))
              + (<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * (<span class="kw">float</span>)(<span class="var">v5</span> * <span class="var">v39</span>)))
      - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v6</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v14</span>) + (<span class="kw">float</span>)(<span class="var">v42</span> * <span class="var">v13</span>)) + (<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v7</span>.<span class="var">m128_f32</span>[<span class="num">0</span>]));
  <span class="var">v23</span> = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v6</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v16</span>) + (<span class="kw">float</span>)(<span class="var">v40</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v13</span>)) + (<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v19</span>))
      - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v6</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v15</span>) + (<span class="kw">float</span>)(<span class="var">v40</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v17</span>)) + (<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v20</span>));
  <span class="var">v24</span> = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v42</span> * <span class="var">v15</span>) + (<span class="kw">float</span>)(<span class="var">v40</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v14</span>)) + (<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v21</span>))
      - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v42</span> * <span class="var">v16</span>) + (<span class="kw">float</span>)(<span class="var">v40</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v31</span>)) + (<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v18</span>));
  <span class="var">v6</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] = <span class="var">v42</span> * <span class="var">v19</span>;
  <span class="var">v25</span> = <span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="fn">COERCE_FLOAT</span>(<span class="fn">HIDWORD</span>(<span class="var">a1</span>-&gt;<span class="var">m128_u64</span>[<span class="num">0</span>]));
  <span class="var">v26</span> = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v42</span> * <span class="var">v20</span>) + (<span class="kw">float</span>)(<span class="var">v40</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v7</span>.<span class="var">m128_f32</span>[<span class="num">0</span>])) + (<span class="kw">float</span>)(<span class="var">v37</span> * <span class="var">v18</span>))
      - (<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v6</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] + (<span class="kw">float</span>)(<span class="var">v40</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * (<span class="kw">float</span>)(<span class="var">v5</span> * <span class="var">v39</span>))) + (<span class="kw">float</span>)(<span class="var">v37</span> * <span class="var">v21</span>));
  <span class="var">v27</span> = <span class="fn">COERCE_FLOAT</span>(*<span class="var">a1</span>) * <span class="var">v50</span>;
  <span class="var">v29</span> = <span class="fn">COERCE_FLOAT</span>(*<span class="var">a1</span>) * <span class="var">v45</span>;
  <span class="var">v34</span> = <span class="var">v37</span> * <span class="fn">COERCE_FLOAT</span>(<span class="fn">HIDWORD</span>(<span class="var">a1</span>-&gt;<span class="var">m128_u64</span>[<span class="num">0</span>]));
  <span class="var">v35</span> = <span class="fn">COERCE_FLOAT</span>(*<span class="var">a1</span>) * <span class="var">v41</span>;
  <span class="var">v36</span> = <span class="var">v42</span> * <span class="fn">COERCE_FLOAT</span>(<span class="fn">HIDWORD</span>(<span class="var">a1</span>-&gt;<span class="var">m128_u64</span>[<span class="num">0</span>]));
  <span class="var">v28</span> = <span class="num">1.0</span>
      / (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v30</span> * <span class="fn">COERCE_FLOAT</span>(*<span class="var">a1</span>)) + (<span class="kw">float</span>)(<span class="var">v32</span> * <span class="var">v42</span>)) + (<span class="kw">float</span>)(<span class="var">v33</span> * <span class="var">v37</span>))
              + (<span class="kw">float</span>)(<span class="var">v3</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>]));
  <span class="var">a1</span>-&gt;<span class="var">m128_f32</span>[<span class="num">0</span>] = <span class="var">v30</span> * <span class="var">v28</span>;
  <span class="var">a1</span>[<span class="num">1</span>].<span class="var">m128_f32</span>[<span class="num">1</span>] = <span class="var">v28</span> * <span class="var">v23</span>;
  <span class="var">a1</span>[<span class="num">1</span>].<span class="var">m128_f32</span>[<span class="num">0</span>] = <span class="var">v28</span> * <span class="var">v22</span>;
  <span class="var">a1</span>[<span class="num">1</span>].<span class="var">m128_f32</span>[<span class="num">3</span>] = <span class="var">v28</span> * <span class="var">v26</span>;
  <span class="var">a1</span>[<span class="num">1</span>].<span class="var">m128_f32</span>[<span class="num">2</span>] = <span class="var">v28</span> * <span class="var">v24</span>;
  <span class="var">a1</span>-&gt;<span class="var">m128_f32</span>[<span class="num">3</span>] = <span class="var">v3</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v28</span>;
  <span class="var">a1</span>-&gt;<span class="var">m128_f32</span>[<span class="num">1</span>] = <span class="var">v32</span> * <span class="var">v28</span>;
  <span class="var">a1</span>-&gt;<span class="var">m128_f32</span>[<span class="num">2</span>] = <span class="var">v33</span> * <span class="var">v28</span>;
  <span class="var">a1</span>[<span class="num">2</span>].<span class="var">m128_f32</span>[<span class="num">0</span>] = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v41</span>) * <span class="var">v43</span>)
                                            + (<span class="kw">float</span>)(<span class="var">v5</span> * (<span class="kw">float</span>)(<span class="var">v37</span> * <span class="var">v50</span>)))
                                    + (<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v42</span> * <span class="var">v45</span>) * <span class="var">v46</span>))
                            - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v5</span> * (<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v45</span>))
                                            + (<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v42</span> * <span class="var">v50</span>) * <span class="var">v43</span>))
                                    + (<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v37</span> * <span class="var">v41</span>) * <span class="var">v46</span>)))
                    * <span class="var">v28</span>;
  <span class="var">a1</span>[<span class="num">2</span>].<span class="var">m128_f32</span>[<span class="num">1</span>] = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v47</span> * (<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v45</span>)) + (<span class="kw">float</span>)(<span class="var">v27</span> * <span class="var">v43</span>))
                                    + (<span class="kw">float</span>)(<span class="var">v34</span> * <span class="var">v46</span>))
                            - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v25</span> * <span class="var">v43</span>) + (<span class="kw">float</span>)(<span class="var">v47</span> * (<span class="kw">float</span>)(<span class="var">v37</span> * <span class="var">v50</span>)))
                                    + (<span class="kw">float</span>)(<span class="var">v29</span> * <span class="var">v46</span>)))
                    * <span class="var">v28</span>;
  <span class="var">a1</span>[<span class="num">2</span>].<span class="var">m128_f32</span>[<span class="num">2</span>] = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v25</span> * <span class="var">v5</span>) + (<span class="kw">float</span>)(<span class="var">v47</span> * (<span class="kw">float</span>)(<span class="var">v42</span> * <span class="var">v50</span>))) + (<span class="kw">float</span>)(<span class="var">v35</span> * <span class="var">v46</span>))
                            - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v47</span> * (<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v41</span>)) + (<span class="kw">float</span>)(<span class="var">v27</span> * <span class="var">v5</span>))
                                    + (<span class="kw">float</span>)(<span class="var">v36</span> * <span class="var">v46</span>)))
                    * <span class="var">v28</span>;
  <span class="var">a1</span>[<span class="num">2</span>].<span class="var">m128_f32</span>[<span class="num">3</span>] = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v47</span> * (<span class="kw">float</span>)(<span class="var">v37</span> * <span class="var">v41</span>)) + (<span class="kw">float</span>)(<span class="var">v29</span> * <span class="var">v5</span>)) + (<span class="kw">float</span>)(<span class="var">v36</span> * <span class="var">v43</span>))
                            - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v34</span> * <span class="var">v5</span>) + (<span class="kw">float</span>)(<span class="var">v47</span> * (<span class="kw">float</span>)(<span class="var">v42</span> * <span class="var">v45</span>))) + (<span class="kw">float</span>)(<span class="var">v35</span> * <span class="var">v43</span>)))
                    * <span class="var">v28</span>;
  <span class="var">a1</span>[<span class="num">3</span>].<span class="var">m128_f32</span>[<span class="num">0</span>] = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v37</span> * <span class="var">v41</span>) * <span class="var">v48</span>) + (<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v42</span> * <span class="var">v50</span>) * <span class="var">v39</span>))
                                    + (<span class="kw">float</span>)(<span class="var">v49</span> * (<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v45</span>)))
                            - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v42</span> * <span class="var">v45</span>) * <span class="var">v48</span>) + (<span class="kw">float</span>)(<span class="var">v49</span> * (<span class="kw">float</span>)(<span class="var">v37</span> * <span class="var">v50</span>)))
                                    + (<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v41</span>) * <span class="var">v39</span>)))
                    * <span class="var">v28</span>;
  <span class="var">a1</span>[<span class="num">3</span>].<span class="var">m128_f32</span>[<span class="num">1</span>] = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v29</span> * <span class="var">v48</span>) + (<span class="kw">float</span>)(<span class="var">v44</span> * (<span class="kw">float</span>)(<span class="var">v37</span> * <span class="var">v50</span>)))
                                    + (<span class="kw">float</span>)(<span class="var">v25</span> * <span class="var">v39</span>))
                            - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v34</span> * <span class="var">v48</span>) + (<span class="kw">float</span>)(<span class="var">v27</span> * <span class="var">v39</span>))
                                    + (<span class="kw">float</span>)(<span class="var">v44</span> * (<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v45</span>))))
                    * <span class="var">v28</span>;
  <span class="var">a1</span>[<span class="num">3</span>].<span class="var">m128_f32</span>[<span class="num">2</span>] = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v36</span> * <span class="var">v48</span>) + (<span class="kw">float</span>)(<span class="var">v27</span> * <span class="var">v49</span>))
                                    + (<span class="kw">float</span>)(<span class="var">v44</span> * (<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v41</span>)))
                            - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v35</span> * <span class="var">v48</span>) + (<span class="kw">float</span>)(<span class="var">v44</span> * (<span class="kw">float</span>)(<span class="var">v42</span> * <span class="var">v50</span>)))
                                    + (<span class="kw">float</span>)(<span class="var">v25</span> * <span class="var">v49</span>)))
                    * <span class="var">v28</span>;
  <span class="var">a1</span>[<span class="num">3</span>].<span class="var">m128_f32</span>[<span class="num">3</span>] = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v44</span> * (<span class="kw">float</span>)(<span class="var">v42</span> * <span class="var">v45</span>)) + (<span class="kw">float</span>)(<span class="var">v35</span> * <span class="var">v39</span>))
                                    + (<span class="kw">float</span>)(<span class="var">v34</span> * <span class="var">v49</span>))
                            - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v29</span> * <span class="var">v49</span>) + (<span class="kw">float</span>)(<span class="var">v36</span> * <span class="var">v39</span>))
                                    + (<span class="kw">float</span>)(<span class="var">v44</span> * (<span class="kw">float</span>)(<span class="var">v37</span> * <span class="var">v41</span>))))
                    * <span class="var">v28</span>;
  <span class="kw">return</span> &amp;<span class="var">retaddr</span>;
}</div>

<p>For matrices like the Projection Matrix, a large number of values are simply 0. If it’s a well-known matrix, it can be inlined without using generic inversions. It would be a huge drop in CPU cycles!</p>

<p>In the dunia engine they have inlined the inverse Projection Matrix calculation by saving important variables while constructing the Projection Matrix and rearranging it.</p>

<p>The instruction count for this generic Cramer’s Rule inverse is roughly 470, not to mention instruction count alone isn’t enough to showcase the inefficiency</p>

<h3 id="1-simd-to-scalar">1. SIMD to Scalar</h3>

<p>The matrix rows are loaded into 128-bit wide registers, but get immediately unpacked to do scalar calculations 32 bits at a time.</p>

<h3 id="2-register-spilling">2. Register Spilling</h3>

<div class="ida-code">  <span class="kw">float</span> <span class="var">v29</span>; <span class="comment">// [rsp+4h] [rbp-1C4h]</span>
  <span class="kw">float</span> <span class="var">v30</span>; <span class="comment">// [rsp+8h] [rbp-1C0h]</span>
  <span class="kw">float</span> <span class="var">v31</span>; <span class="comment">// [rsp+Ch] [rbp-1BCh]</span>
  <span class="kw">float</span> <span class="var">v32</span>; <span class="comment">// [rsp+10h] [rbp-1B8h]</span>
  <span class="kw">float</span> <span class="var">v33</span>; <span class="comment">// [rsp+14h] [rbp-1B4h]</span>
  <span class="kw">float</span> <span class="var">v34</span>; <span class="comment">// [rsp+1Ch] [rbp-1ACh]</span>
  <span class="kw">float</span> <span class="var">v35</span>; <span class="comment">// [rsp+20h] [rbp-1A8h]</span>
  <span class="kw">float</span> <span class="var">v36</span>; <span class="comment">// [rsp+24h] [rbp-1A4h]</span>
<span class="comment">etc...</span></div>

<p>16 XMM registers are available for floating-point math. This function defines over 50 individual float variables.</p>

<p>The variables written with annotations like <code class="language-plaintext highlighter-rouge">[rsp+Ch]</code> indicates that the CPU ran out of hardware registers and was forced to “spill” intermediate calculations to the stack.</p>

<h3 id="3-dependency-chains">3. Dependency Chains</h3>

<p>look at:</p>

<div class="ida-code">  <span class="var">v28</span> = <span class="num">1.0</span>
      / (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v30</span> * <span class="fn">COERCE_FLOAT</span>(*<span class="var">a1</span>)) + (<span class="kw">float</span>)(<span class="var">v32</span> * <span class="var">v42</span>)) + (<span class="kw">float</span>)(<span class="var">v33</span> * <span class="var">v37</span>))
              + (<span class="kw">float</span>)(<span class="var">v3</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>]));</div>

<p>which represents <code class="language-plaintext highlighter-rouge">1.0/determinant</code>. Its calculation requires v30, v32, v33, and others to be completely finished.</p>

<p>Subsequently, every single element written back to the final matrix depends on v28.</p>

<div class="ida-code">  <span class="var">a1</span>[<span class="num">2</span>].<span class="var">m128_f32</span>[<span class="num">1</span>] = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v47</span> * (<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v45</span>)) + (<span class="kw">float</span>)(<span class="var">v27</span> * <span class="var">v43</span>))
                                    + (<span class="kw">float</span>)(<span class="var">v34</span> * <span class="var">v46</span>))
                            - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v25</span> * <span class="var">v43</span>) + (<span class="kw">float</span>)(<span class="var">v47</span> * (<span class="kw">float</span>)(<span class="var">v37</span> * <span class="var">v50</span>)))
                                    + (<span class="kw">float</span>)(<span class="var">v29</span> * <span class="var">v46</span>)))
                    * <span class="var">v28</span>;
  <span class="var">a1</span>[<span class="num">2</span>].<span class="var">m128_f32</span>[<span class="num">2</span>] = (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v25</span> * <span class="var">v5</span>) + (<span class="kw">float</span>)(<span class="var">v47</span> * (<span class="kw">float</span>)(<span class="var">v42</span> * <span class="var">v50</span>))) + (<span class="kw">float</span>)(<span class="var">v35</span> * <span class="var">v46</span>))
                            - (<span class="kw">float</span>)((<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">v47</span> * (<span class="kw">float</span>)(<span class="var">v38</span>.<span class="var">m128_f32</span>[<span class="num">0</span>] * <span class="var">v41</span>)) + (<span class="kw">float</span>)(<span class="var">v27</span> * <span class="var">v5</span>))
                                    + (<span class="kw">float</span>)(<span class="var">v36</span> * <span class="var">v46</span>)))
                    * <span class="var">v28</span>;
<span class="comment">etc...</span></div>

<p>I would be very surprised if it isn’t stalled at least somewhere in the pipeline</p>

<p>The dunia engine inlined this to about 15-20 instructions by saving required values into registers or the stack and constructing it at the end of projection matrix construction.
More info on my previous write-up! <a href="https://zero-irp.github.io/Proj-Blog/part-6-reversing-inverse-projection-matrix/">Part 6: Reversing Construction of the Inverse Projection Matrix</a></p>

<p>This also ties into the Camera Matrix and honestly various other matrices!</p>

<blockquote>
  <p>Reminder!<br />
Camera Matrix^-1 = View Matrix</p>
</blockquote>

<p>Let’s take the view matrix for example and how to inline it from my previous blog <a href="https://zero-irp.github.io/ViewProj-Blog/part-4.2-reversing-simd-instrustions/#fast-inverse-for-orthonormal-matrices">Reversing The ViewProjection Matrix (Part 4.2: Reversing SIMD Instructions for Matrix Math - Fast inverse for orthonormal Matrices)</a></p>

<h3 id="fast-inverse-for-orthonormal-matrices">Fast inverse for orthonormal Matrices</h3>

<p>If R is a pure rotation matrix meaning:</p>

<ul>
  <li>No scaling,</li>
  <li>No shear,</li>
  <li>It’s orthonormal (columns are perpendicular and unit-length)</li>
</ul>

<p>then \(R^{-1} = R^T\)</p>

<p>Suppose a 4x4 matrix with homogenous coordinates:</p>

\[C_{world} =
\begin{bmatrix}
R_{00} &amp; R_{01} &amp; R_{02} &amp; 0 \\
R_{10} &amp; R_{11} &amp; R_{12} &amp; 0 \\
R_{20} &amp; R_{21} &amp; R_{22} &amp; 0 \\
T_x &amp; T_y &amp; T_z &amp; 1.0
\end{bmatrix}\]

<p>Here:</p>
<ul>
  <li>R (upper 3×3) is the orientation of the camera in world space.</li>
  <li>T (bottom row, first 3 values) is the position of the camera in world space.</li>
</ul>

<p>To get \(C_{world}^{-1}\) we can separate the matrix like so:</p>

\[C_{world} =
\begin{bmatrix}
R &amp; 0 \\
T &amp; 1
\end{bmatrix}\]

<p>and we want its inverse.</p>

<p>The block matrix inverse formula for this special form  is:</p>

\[\begin{bmatrix} 
A &amp; 0 \\ 
B &amp; 1 
\end{bmatrix}^{-1}
=
\begin{bmatrix} 
A^{-1} &amp; 0 \\ 
-BA^{-1} &amp; 1 
\end{bmatrix}\]

<p><em>(See <a href="https://en.wikipedia.org/wiki/Invertible_matrix#Blockwise_inversion">Wikipedia: Blockwise inversion</a> for the general derivation)</em></p>

<p>Applying The Formula we get:</p>

<ul>
  <li>A = R</li>
  <li>B = T</li>
</ul>

<p>So:</p>

\[C_{world}^{-1} =
\begin{bmatrix}
R^{-1} &amp; 0 \\
-TR^{-1} &amp; 1
\end{bmatrix}\]

<p>Since R is orthonormal (\(R^{-1} = R^T\)):</p>

\[C_{world}^{-1} =
\begin{bmatrix}
R^T &amp; 0 \\
-TR^T &amp; 1
\end{bmatrix}\]

<blockquote>
  <p>Exponent “T” represents the Transpose and Regular “T” represents the Translation</p>
</blockquote>

<p>Now Expand  \(−TR^T\) into its dot products:</p>

<p>if:</p>

\[R =
\begin{bmatrix}
R_{0x} &amp; R_{0y} &amp; R_{0z} \\
R_{1x} &amp; R_{1y} &amp; R_{1z} \\
R_{2x} &amp; R_{2y} &amp; R_{2z} \\
\end{bmatrix}\]

<p>and \(T = [T_x, T_y, T_z],\)</p>

<p>So:</p>

\[-TR^T=
\begin{bmatrix} 
-T_x &amp; -T_y &amp; -T_z \\
\end{bmatrix}
\times
\begin{bmatrix}
R_{0x} &amp; R_{1x} &amp; R_{2x} \\
R_{0y} &amp; R_{1y} &amp; R_{2y} \\
R_{0z} &amp; R_{1z} &amp; R_{2z} \\
\end{bmatrix}\]

<p>then:</p>

\[-TR^T = [-dot(T,R_0), -dot(T,R_1), -dot(T,R_2)]\]

<p>So the last row becomes:</p>

\[[-dot(T,R_0), -dot(T,R_1), -dot(T,R_2)]\]

<p>And Expanding \(R^T\) is just the Transpose of the Rotation, thus completing the inverse:</p>

\[C_{world}^{-1} =
\begin{bmatrix}
R^T &amp; 0 \\
-TR^T &amp; 1
\end{bmatrix}\]

<p>This is also seen in the dunia engine and honestly in most game engines.</p>

<h3 id="dunia-engine-example-for-view-matrix-fast-inverse">Dunia Engine Example for View Matrix Fast Inverse:</h3>

<div class="ida-code">  <span class="comment">// Dot Product of: Right • CamPos</span>
  <span class="var">rightTrans</span> = (<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">rightY</span> * <span class="var">CamPos_XYZ</span>[<span class="num">1</span>]) + (<span class="kw">float</span>)(<span class="var">rightX</span> * *<span class="var">CamPos_XYZ</span>))
             + (<span class="kw">float</span>)(<span class="var">rightZ</span> * <span class="var">CamPos_XYZ</span>[<span class="num">2</span>]);
  <span class="var">forwardZ</span> = *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x88</span>);
  <span class="var">upX</span> = *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x90</span>);
  <span class="var">upY</span> = *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x94</span>);

  <span class="comment">// Dot Product of: Forward • CamPos</span>
  <span class="var">forwardTrans</span> = (<span class="kw">float</span>)((<span class="kw">float</span>)(<span class="var">forwardY</span> * <span class="var">CamPos_XYZ</span>[<span class="num">1</span>]) + (<span class="kw">float</span>)(<span class="var">forwardX</span> * *<span class="var">CamPos_XYZ</span>))
               + (<span class="kw">float</span>)(<span class="var">forwardZ</span> * <span class="var">CamPos_XYZ</span>[<span class="num">2</span>]);
  <span class="var">upZ</span> = *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x98</span>);

  <span class="comment">// Dot product of: up • CamPos (upTrans + v18)</span>
  *(<span class="kw">float</span> *)&amp;<span class="var">v18</span> = <span class="var">upZ</span> * <span class="var">CamPos_XYZ</span>[<span class="num">2</span>];
  <span class="var">upTrans</span> = (<span class="kw">float</span>)(<span class="var">upY</span> * <span class="var">CamPos_XYZ</span>[<span class="num">1</span>]) + (<span class="kw">float</span>)(<span class="var">upX</span> * *<span class="var">CamPos_XYZ</span>);</div>

<p>Standard Camera Matrix layout being (Memory Layout):</p>

\[C_{world} =
\begin{bmatrix}
r_x &amp; r_y &amp; r_z &amp; 0 \\
u_x &amp; u_y &amp; u_z &amp; 0 \\
f_x &amp; f_y &amp; f_z &amp; 0 \\
p_x &amp; p_y &amp; p_z &amp; 1.0
\end{bmatrix}\]

<p>here the right vector would be stored like so:</p>

<p><code class="language-plaintext highlighter-rouge">*(float *)(a1 + 0x30) = rightX;   *(float *)(a1 + 0x34) = rightY;   *(float *)(a1 + 0x38) = rightZ;</code></p>

<p>The dunia engine transposes it like so and adds the dot products.</p>

<div class="ida-code">  <span class="comment">// Fast inverse for orthonormal matrices (View Matrix Construction)</span>
  *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x30</span>) = <span class="var">rightX</span>;
  *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x40</span>) = <span class="var">rightY</span>;
  *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x50</span>) = <span class="var">rightZ</span>;
  *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x34</span>) = <span class="var">forwardX</span>;
  *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x44</span>) = <span class="var">forwardY</span>;
  *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x54</span>) = <span class="var">forwardZ</span>;
  *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x58</span>) = <span class="var">upZ</span>;
  *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x38</span>) = <span class="var">upX</span>;
  *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x48</span>) = <span class="var">upY</span>;
  *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x60</span>) = -<span class="var">rightTrans</span>;
  *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x64</span>) = -<span class="var">forwardTrans</span>;
  *(<span class="kw">float</span> *)(<span class="var">a1</span> + <span class="num">0x68</span>) = -(<span class="kw">float</span>)(<span class="var">upTrans</span> + *(<span class="kw">float</span> *)&amp;<span class="var">v18</span>);</div>

<h3 id="closing-thoughts">Closing thoughts:</h3>

<p>I’m all out of rants and tangents to go on about, here are some main takeaways:</p>

<p><strong>1. Clean C++ Code ≠ Clean Compiled Code</strong></p>

<p><em>Abstraction is a luxury where the cost is performance.</em> Generalized math wrappers and trusting the compiler to “figure it out” is how you end up with 133 instructions instead of 3.</p>

<p><strong>2. Profilers won’t save you</strong></p>

<p>If the entire foundation is bloated, the baseline execution cost of every function is artificially raised. <em>Profiler-Invisible Waste / Death by a Thousand Cuts</em></p>

<p><strong>3. Following the textbook perfectly</strong></p>

<p>Those matrix inversions, matrix multiplications, identity matrices all look great on the whiteboard but in a low-level CPU pipeline it is going about it in a really roundabout way.</p>

<p><strong>4. The “Main Path” Contagion</strong></p>

<p>This is not even a niche, unimportant function. This is the main rendering function preparing many different matrices bound for the GPU for calculations.<br />
You can guarantee this exact same philosophy infests every other system in the engine.</p>

<p><strong>5. Who was the common antagonist anyway?</strong></p>

<p>You might have already guessed! It’s the <code class="language-plaintext highlighter-rouge">MatrixMultiply4x4()</code> but really that’s just the narrative for this write-up.</p>

<p>The true antagonist is the codebase culture itself. It is the “Clean C++” philosophy that prioritizes developer convenience and generic abstractions over CPU execution realities. This exact same over-engineering bleeds into entirely different mathematical primitives across the entire engine.</p>

<p>And it probably doesn’t even stop at just math libraries, probably every other library is also abused like this.</p>

<p>In my next write-up, we are going to look at the exact opposite problem. We are going to explore the Compatibility Tax. The ghost of a 12-year-old CPU that keeps modern games from utilizing instructions that could theoretically yield 5x speedups.</p>

<p><em>Until then, worship the IDA goddess!</em></p>

<div class="post-nav">
  <a href="/Redundancy-seen-in-AAA-game-engines/part-3-Avalanche-engine/">&laquo; Part 3: Avalanche Engine</a>
</div>]]></content><author><name>z1rp</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Part 1: Introduction</title><link href="https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/part-1-intro/" rel="alternate" type="text/html" title="Part 1: Introduction" /><published>2026-04-03T18:30:00+00:00</published><updated>2026-04-03T18:30:00+00:00</updated><id>https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/part-</id><content type="html" xml:base="https://zero-irp.github.io/Redundancy-seen-in-AAA-game-engines/part-1-intro/"><![CDATA[]]></content><author><name>z1rp</name></author><summary type="html"><![CDATA[]]></summary></entry></feed>