Temporal Anti-Aliasing (TAA): Matrix Multiplication Replaces Vector Addition

ESP-Image1

Here we see a very “Textbook” way of adding jitters to the projection matrix for SMAA_T2X in the Avalanche Engine.

The classic textbook way being:

\[\begin{bmatrix} x_{scale} & 0 & 0 & 0 \\ 0 & y_{scale} & 0 & 0 \\ 0 & 0 & \dfrac{z_{far}}{z_{far}-z_{near}} & 1 \\ 0 & 0 & -\dfrac{z_{near}z_{far}}{z_{far}-z_{near}} & 0 \end{bmatrix} \times \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ j_x & j_y & 0 & 1 \end{bmatrix} = \begin{bmatrix} x_{scale} & 0 & 0 & 0 \\ 0 & y_{scale} & 0 & 0 \\ j_x & j_y & \dfrac{z_{far}}{z_{far}-z_{near}} & 1 \\ 0 & 0 & -\dfrac{z_{near}z_{far}}{z_{far}-z_{near}} & 0 \end{bmatrix}\]

This looks Clean when looking at the source code, but in a low-level CPU render loop, it is inefficient.

Probably would look something like this in C++:

Matrix4x4 jitterMatrix = Matrix4x4::Identity(); jitterMatrix.m[3][0] = jX; jitterMatrix.m[3][1] = jY; projMatrix = projMatrix * jitterMatrix;

Again: Clean C++ Code ≠ Clean Compiled Code

  • First we need to construct an entire 4x4 Identity Matrix on the stack just to hold two float values.
  • Then load the matrices as arguments into the function.
  • Inside the function do stack allocation, set up security cookies, load registers etc.
  • Multiply all rows to columns (repeated 4 times)
  • Finally end the function by deallocating stack, verify the security cookie, loading result into memory and registers.

Note: Calculating curFrame & 1, selecting jitters, scaling them down to sub-pixel space are all mathematically necessary steps and are not over-engineered.

The easier way to do it would be:

  • Take the scaled down jitters.
  • Take the 2nd Row (counting from 0) of the Projection Matrix and do a very simple addps.

Example (where the jitters were already scaled down):

movaps xmm0, projMat_2  ; Load 2nd Row [0, 0, Z_scale, 1]
movaps xmm1, jitter_Row ; [jitX, jitY, 0, 0]

addps xmm0, xmm1 ; result: [jX, jY, Z_scale, 1]

That’s about a 120 instruction count drop to 3.

« Part 2: Ghost Of Tsushima Part 4: Using generic matrix inverses »