What I remember
I simply remember from Cyrill Stachniss that we can drop the last column when it has a value of 1. We normalize the vector by dividing by that value.
Watch the 3b1b video?
- Or cyrill stachniss video on homogenous coordinates?
When representing a point in 3D space using homogeneous coordinates, it is extended to .
- , and are obtained by scaling the original coordinates by a factor of .
- The value of w can be any non-zero value, and it represents the scaling factor applied to the coordinates
#todo I am still confused on the motivation
The use of homogeneous coordinates allows for the combination of different types of transformations, such as translation, rotation, scaling, and perspective projection, into a single matrix multiplication. This is done by extending the transformation matrices to 4x4 matrices, where the additional row and column allow for the homogeneous representation.
Ahh it makes sense. After my Nvidia interview.
See Homogeneous Linear Equation for a review of the fundamentals.
Spatial Algebra details how the homogeneous coordinates are used to do transforms.
Two great things
- Transformations expressed by matrix multiplication (SE(3))
- Points at infinity
Normalized Coordinates (From Visual SLAM book)
I don’t quite understand this.
The projection process can also be viewed from another perspective. The formula above shows that we can convert a world coordinate point to the camera coordinate system first and then remove the last dimension. The depth of the point from the imaging plane of the camera is then removed, which is equivalent to the normalization on the last dimension. In this way, we get the projection of the point on the camera normalized plane:
The normalized coordinates can be seen as a point in the plane in front of the camera
- Note that in the calculation, it is necessary to check whether is positive because the negative can also get the point on the normalized plane by this method. However, the camera does not capture the scene behind the imaging plane.
This plane is also called the normalized plane. We normalize the coordinates and then multiply them with the intrinsic matrix, yielding the pixel coordinates. We can also consider the pixel coordinates as the result of quantitative measurements on points on the normalized plane. If the camera coordinates are multiplied by any non-zero constant simultaneously, the normalized coordinates are the same, which means that the depth is lost during the projection process. So, in monocular vision, the pixel’s depth value cannot be obtained by a single image.