Camera Intrinsics

These are calculated as part of the Camera Calibration process. Parameters through the Calibration Matrix.

At NVIDIA, I saw that these can be obtained from the camera manufacturer through an EEPROM.

Distinction with Cyrill Stachniss

In my Cyrill Stachniss Camera Calibration notes, I have an extra coordinate system, the sensor plane. In there, the image plane represents the idealized projection, whereas the sensor plane represents the actual measurements (corrects for Principal Point).

Below, these directly go to the sensor plane.

Copy Pasted from Pinhole Camera Geometry

Below, I derive how we go from world $\to$ image coordinate for camera intrinsics. In my Camera Calibration, we go from world $\to$ image $\to$ sensor coordinate system.

This is the Optical Frame, the drawing on the right is a bird-eye view

Let $O - x - y - z$ be the camera coordinate system.

The 3D point $P$ , after being projected through the hole $O$ , falls on the physical imaging plane $O^{'} - x^{'} - y^{'}$ and produces the image point $P$ .

We define the following:

the 3D point $P = [X, Y, Z]^{T}$
the image point $P^{'} = [X^{'}, Y^{'}, Z^{'}]^{T}$
$f$ is the physical distance from the imaging plane to camera plane is $f$ (focal length)

Then, according to the similarity of the triangles,

\frac{Z}{f} = - \frac{X}{X ^{'}} = - \frac{Y}{Y ^{'}}

The negative sign indicates that the image is inverted. We can equivalently place the imaging plane symmetrically in front of the camera.

This removes the negative sign in the formula to make it more compact: $\frac{Z}{f} = \frac{X}{X ^{'}} = \frac{Y}{Y ^{'}}$ So we get $X^{'} = f \frac{X}{Z}, Y^{'} = f \frac{Y}{Z}$

So where does $f_x$ and $f_y$ come from?

$f_{x}$ and $f_{y}$ are the same if you have a square pixel (no stretching). However the pixel’s width and height are different and the pixel is like a rectangle shape.

That’s why Cyrill Stachniss represented it as f times a shear factor.

Pixel Coordinates

To describe how the sensor converts the perceived light into image pixels, we set a pixel plane $o - u - v$ fixed on the physical imaging plane

Between the pixel coordinate system and the imaging plane, there is an apparent zoom and a translation of the origin:

pixel coordinates scales $α$ times on the $u$ axis and $β$ times on $v$
origin is translated by $[c_{x}, c_{y}]^{T}$

Then, the relationship between the coordinates of $P^{'}$ and the pixel coordinate $[u, v]^{T}$ is:

u = α X^{'} + c_{x} v = β Y^{'} + c_{y}

We then replace $X^{'}$ and $Y^{'}$ with the values found previously:

{u = f_{x} \frac{X}{Z} + c_{x} v = f_{y} \frac{Y}{Z} + c_{y},

$f$ is the focal length in meters
$α$ and $β$ is in pixels/meter, so $f_{x}, f_{y}$ and $c_{x}, c_{y}$ are in pixels.

We can write this as a matrix. Let’s put $Z$ to the left side as in most books:

\frac{Z u v 1 = f _{x} 0 0 0 f _{y} 0 c _{x} c _{y} 1 X Y Z \buildrel Δ}{= KP}

Notice that LHS is a Homogeneous 2D coordinate, while RHS is non-homogeneous (Cartesian coordinates)
$K$ denotes the camera’s inner parameter matrix, so there are 4 parameters: $f_{x}, f_{y}, c_{x}, c_{y}$

Camera Extrinsics

🛠️ Steven Gong

Table of Contents

Camera Intrinsics

Copy Pasted from Pinhole Camera Geometry

Pixel Coordinates

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

Camera Intrinsics

Copy Pasted from Pinhole Camera Geometry

Pixel Coordinates

Related

Graph View

Backlinks