Part 2.1: View Matrix

Introduction

In 3D computer graphics, transformation matrices are the behind-the-scenes math that turns 3D world into something that actually shows up on your 2D screen. Three particularly important matrices work together to take 3D world coordinates and transform them into 2D screen coordinates: the View Matrix, Projection Matrix, and their combination, the View Projection Matrix

Camera World Matrix

Before jumping into the view matrix we first have to know about the camera world matrix and its relation with the view matrix (some people refer to it by different names, but I’ll just call it the camera world matrix)

The camera world matrix is basically the camera’s own transform. It describes how the camera is oriented and positioned in world space

It usually looks like this (in row-major form):

right.x	right.y	right.z	0
up.x	up.y	up.z	0
Forward.x	Forward.y	Forward.z	0
pos.x	pos.y	pos.z	1

Position Vector: The position of the camera ‘(x,y,z)’ in world space.

we will discuss the right/up/forward vectors in the View Matrix segment and later how this layout differs from Ghost of Tsushima.

What is a View-Matrix?

The view matrix is a 4x4 matrix that transforms coordinates from world space into view (or camera) space. It does not tell us where the camera is, that’s what the camera world matrix is for. Instead, the view matrix takes everything in the world and repositions it relative to the camera, as if the camera is sitting at the origin (0, 0, 0) and looking straight down its forward axis.

world space -> view space

So in a sense, it’s not the camera that moves but it’s the world that shifts and rotates around the camera

The view matrix is just the inverse of the camera world matrix.
Since the camera world matrix places the camera in the world, its inverse effectively ‘un-places’ the world relative to the camera, resulting in the view matrix.

Layout Of the View Matrix

Here is a row-major, left-handed layout of how the View Matrix looks like

Right.x	Up.x	Forward.x	0
Right.y	Up.y	Forward.y	0
Right.z	Up.z	Forward.z	0
-dot(Right, Position)	-dot(Up, Position)	-dot(Forward, Position)	1

This layout isn’t a hard rule, it can change depending on things like whether the engine uses a right-handed or left handed coordinate system, whether it stores matrices in row-major or column-major order, or even custom layouts used for optimization. What matters is consistency: as long as the engine expects that specific layout, it will work. But no matter how it’s arranged, the matrix still needs the same essential components: Right, Up, Forward, and Translation.

This 4x4 Matrix layout is common in DirectX Games where it is usually row-major and left-handed.
This View Matrix is made up of 4 fundamental vectors:

Right Vector: which direction is right from the camera’s perspective
Up Vector: which direction is up from the camera’s perspective
Forward Vector: which direction the camera is facing
Translation Vector: The component that translates the world to the camera’s origin

Before diving into the fundamental vectors, let’s take a moment to understand row vs. column major order and left- vs. right-handed coordinate systems:

Row-Major: The matrix is constructed row by row.
Column-Major: The Matrix is constructed column by column.

Eg: The right Vector would have occupied the first row in row-major Matrix but the first column in column-major Matrix.

Left-Handedness: +Z points into the screen.
Right-Handedness: +Z points out of the screen.

This is why in Right-Handed systems the forward Vector will be negated.

Breaking Down the View Matrix Vectors

The Right, Up and Forward vectors in your matrix define the orientation of an object or camera in 3D space.

Right Vector (X-axis):
The Right vector is a directional vector that points to the right-hand side of the camera in the camera’s local space. It Defines the camera’s local X-axis

Up Vector (Y-axis):
This is also a directional Vector but points toward the top side of the camera. It points straight up from the camera’s perspective. It defines the camera’s local Y-axis

Forward Vector:
This Vector points in the direction the camera is facing. (Where the camera is looking at) It defines the camera’s local Z-axis

The forward vector direction is typically (LookAtTarget - CameraPosition).normalized() for Left-Handed systems like Direct-X but for right-handed systems the view matrix uses the negative Z-axis, which would be (CameraPosition - LookAtTarget).normalized().

The components of orientation vectors (Right, Up, Forward) will always be in the range of -1.0 to 1.0.

Translation Vector:
This translation vector basically tells the view matrix how to undo the camera’s position and orientation so that the world moves around the camera and not the other way around. This is derived from a dot product between the orientation vectors and the camera’s position. In the view matrix, this position is negated to move the world around the camera.

Orthogonality and Relations:- (Optional/Good to know)

For a camera to behave properly in 3D space, its orientation vectors (Right, Up, Forward) must be orthogonal (Perpendicular) to each other.
In Vector Algebra we know 2 perpendicular vector’s dot product is always 0.

Thus:
dot(Right, Up) = 0
dot(Up, Forward) = 0
dot(Forward, Right) = 0

In vector algebra, we also have the cross product and the right-hand rule, which states:
The cross product of two vectors “A” and “B” results in a new vector (C = A x B). This resultant vector “C” is always perpendicular (orthogonal) to both original vectors “A” and “B”.

Thus:
cross(Up, Right) = Forward
cross(Right, Forward) = Up
cross(Forward, Up) = Right
(For Left-Handed systems)

This returns a new vector perpendicular to the two input vectors. This relationship ensures all three orientation axes form a proper coordinate system

Model Matrix:- (Optional/Good to know)

Before we move forward, it’s worth mentioning the Model Matrix
The Model Matrix is defined by an object’s position, rotation, and scale. It transforms the object’s vertices from their local model space into world space. The X,Y,Z coordinates of these vertices are defined relative to the object’s center, This Matrix is responsible for transforming a 3D model from its local space into world space handling the model’s position, rotation and scale.

Model space -> World space

For the purpose of world-to-screen projection (basic ESP overlay), we don’t actually need the model matrix.
Why?
The esp we are building only needs to know where something is in world space and the game has already computed and placed the model in world space. Since the model matrix just moves the object from local space to world space, and we already have the world position, it becomes redundant in this context.
We don’t use this, but it’s worth knowing because the Camera World Matrix we discussed earlier is just a special case of a Model Matrix: it represents the camera’s position and orientation in world space, just like a model matrix does for any object.

Quick Note: While a model matrix typically includes rotation, translation, and scale, the camera world matrix only includes rotation and translation. Structurally, they share the same layout, the camera world matrix is essentially a model matrix for the camera itself.

If you want to learn more about model, view and projection matrices you can check out this OpenGL tutorial and this video which i found informative explaining the math behind matrices and 3d graphics theory:
OpenGL tutorial
In Video Games, The Player Never Moves

All of these mathematical relationships, spatial orientations, and foundational concepts might seem unnecessary but they are crucial. We’ll rely on this knowledge to accurately identify the different matrices in memory, decode how the game constructs them, and understand exactly how the camera views the world.