4.4: Optimizing Redundant SIMD Instructions
Now it’s time to optimize all the redundant SIMD operations we have seen till now.
Expect no performance gain this is only for a learning experience.
Just for a refresher we took a bottom up approach on finding functions responsible for finding the Construction Function.
Function traced back from:
sub_9AA040 → sub_sub_7FBF10 → sub_7FD060
Let’s First try to optimize sub_7FD060 which we previously renamed to ProjXViewMul.
The redundant or overly complicated SIMD function was:
we can change this to achieve the same functionality by simply doing:
But we cannot just type it into IDA’s pseudo code, we need to patch it manually in assembly.
Finding Assembly responsible for this function:
To find assembly instructions corresponds to out pseudo-code we will use IDA’s synchronize function.

Highlighting what we are looking for:

Now we see the corresponding assembly:

here i will just do the quick and easy method of nopping out every instruction we don’t need, see which xmm register holds ProjMat_3 and simply storing it at the appropriate memory address:
Let’s see which xmm register holds ProjMat_3
Highlight this like before and we see it corresponds to:
making sure xmm10 is not changed along the way and we can just store it directly at:

xmm11 is the calculated output of the enitre Vec4 x Mat4x4, chaning this to xmm10 will achive the same function.
we will nop out all other instructions:
Nopping Instructions:
all these are going to nopped out.
Now we are going to nop all these out and change xmm11 to xmm10 in the final movaps in cheat engine and see if it works as intended.


Result?
Game runs as we suspected without any artifacts in game or in UI:
