Homogeneous Coordinate

In mathematics, homogeneous coordinates are a Coordinate System used in projective geometry, just as Cartesian Coordinates are used in Euclidean Geometry.

In this article: https://articulatedrobotics.xyz/4-translations/, they talk about how we use homogeneous coordinates to representation a translation as a Linear Transformation.

What I remember

I simply remember from Cyrill Stachniss that we can drop the last column when it has a value of 1. We normalize the vector by dividing by that value.

Watch the 3b1b video?

Or cyrill stachniss video on homogenous coordinates?

When representing a point $(x, y, z)$ in 3D space using homogeneous coordinates, it is extended to $(x^{'}, y^{'}, z^{'}, w)$ .

$x^{'}, y^{'}$ , and $z^{'}$ are obtained by scaling the original coordinates by a factor of $c w$ .
The value of w can be any non-zero value, and it represents the scaling factor applied to the coordinates

#todo I am still confused on the motivation

The use of homogeneous coordinates allows for the combination of different types of transformations, such as translation, rotation, scaling, and perspective projection, into a single matrix multiplication. This is done by extending the transformation matrices to 4x4 matrices, where the additional row and column allow for the homogeneous representation.

Ahh it makes sense. After my Nvidia interview.

See Homogeneous Linear Equation for a review of the fundamentals.

Spatial Algebra details how the homogeneous coordinates are used to do transforms.

Resources

Two great things

Transformations expressed by matrix multiplication (SE(3))
Points at infinity

Normalized Coordinates (From Visual SLAM book)

I don’t quite understand this.

The projection process can also be viewed from another perspective. The formula above shows that we can convert a world coordinate point to the camera coordinate system first and then remove the last dimension. The depth of the point from the imaging plane of the camera is then removed, which is equivalent to the normalization on the last dimension. In this way, we get the projection of the point $P$ on the camera normalized plane:

(R P_{w} + t) = Camera Coordinates [X, Y, Z]^{T} \to Normalized Coordinates [X / Z, Y / Z, 1]^{T} .

The normalized coordinates can be seen as a point in the $z = 1$ plane in front of the camera

Note that in the calculation, it is necessary to check whether $Z$ is positive because the negative $Z$ can also get the point on the normalized plane by this method. However, the camera does not capture the scene behind the imaging plane.

This $z = 1$ plane is also called the normalized plane. We normalize the coordinates and then multiply them with the intrinsic matrix, yielding the pixel coordinates. We can also consider the pixel coordinates $[u, v]^{T}$ as the result of quantitative measurements on points on the normalized plane. If the camera coordinates are multiplied by any non-zero constant simultaneously, the normalized coordinates are the same, which means that the depth is lost during the projection process. So, in monocular vision, the pixel’s depth value cannot be obtained by a single image.

🛠️ Steven Gong

Table of Contents

Homogeneous Coordinate

Normalized Coordinates (From Visual SLAM book)

Graph View

Backlinks