Now it’s time to optimize all the redundant SIMD operations we have seen till now.

Expect no performance gain this is only for a learning experience.

Just for a refresher we took a bottom up approach on finding functions responsible for finding the Construction Function.

Function traced back from:

sub_9AA040 → sub_sub_7FBF10 → sub_7FD060

Let’s First try to optimize sub_7FD060 which we previously renamed to ProjXViewMul.

The redundant or overly complicated SIMD function was:

v16[3] = _mm_add_ps( _mm_add_ps( _mm_add_ps( _mm_mul_ps(_mm_shuffle_ps((__m128)Mask_0001f, (__m128)Mask_0001f, 0), ProjMat_0), _mm_mul_ps(_mm_shuffle_ps((__m128)Mask_0001f, (__m128)Mask_0001f, 0x55), ProjMat_1)), _mm_mul_ps(_mm_shuffle_ps((__m128)Mask_0001f, (__m128)Mask_0001f, 0xAA), ProjMat_2)), _mm_mul_ps(_mm_shuffle_ps((__m128)Mask_0001f, (__m128)Mask_0001f, 0xFF), ProjMat_3));

we can change this to achieve the same functionality by simply doing:

v16[3] = ProjMat_3

But we cannot just type it into IDA’s pseudo code, we need to patch it manually in assembly.

Finding Assembly responsible for this function:

To find assembly instructions corresponds to out pseudo-code we will use IDA’s synchronize function.

ESP-Image1

Highlighting what we are looking for:

ESP-Image1

Now we see the corresponding assembly:

ESP-Image1

here i will just do the quick and easy method of nopping out every instruction we don’t need, see which xmm register holds ProjMat_3 and simply storing it at the appropriate memory address:

movaps v16[3], xmm

Let’s see which xmm register holds ProjMat_3

ProjMat_3 = *(__m128*)(camStruct + 0x230);

Highlight this like before and we see it corresponds to:

movups xmm10, xmmword ptr [rcx+230h]

making sure xmm10 is not changed along the way and we can just store it directly at:

ESP-Image1

xmm11 is the calculated output of the enitre Vec4 x Mat4x4, chaning this to xmm10 will achive the same function.

we will nop out all other instructions:

Nopping Instructions:

movups xmm4, cs:Mask_0001f movaps xmm11, xmm4 shufps xmm11, xmm4, 0 movaps xmm0, xmm4 mulps xmm11, xmm6 shufps xmm0, xmm4, 55h ; 'U' mulps xmm0, xmm7 movaps xmm1, xmm4 shufps xmm1, xmm4, 0AAh addps xmm11, xmm0 mulps xmm1, xmm9 shufps xmm4, xmm4, 0FFh addps xmm11, xmm1 mulps xmm4, xmm10 addps xmm11, xmm4

all these are going to nopped out.

Now we are going to nop all these out and change xmm11 to xmm10 in the final movaps in cheat engine and see if it works as intended.

ESP-Image1

ESP-Image1

Result?

Game runs as we suspected without any artifacts in game or in UI:

ESP-Image1