Pinhole Camera Geometry

# Camera Intrinsics §

These are calculated as part of the Camera Calibration process. Parameters through the Calibration Matrix.

At NVIDIA, I saw that these can be obtained from the camera manufacturer through an EEPROM.

Distinction with Cyrill Stachniss

In my Cyrill Stachniss Camera Calibration notes, I have an extra coordinate system, the sensor plane. In there, the image plane represents the idealized projection, whereas the sensor plane represents the actual measurements (corrects for Principal Point).

Below, these directly go to the sensor plane.

Below, I derive how we go from world $→$ image coordinate for camera intrinsics. In my Camera Calibration, we go from world $→$ image $→$ sensor coordinate system.

- This is the Optical Frame, the drawing on the right is a bird-eye view

Let $O−x−y−z$ be the camera coordinate system.

The 3D point $P$, after being projected through the hole $O$, falls on the physical imaging plane $O_{′}−x_{′}−y_{′}$ and produces the image point $P$.

We define the following:

- the 3D point $P=[X,Y,Z]_{T}$
- the image point $P_{′}=[X_{′},Y_{′},Z_{′}]_{T}$
- $f$ is the physical distance from the imaging plane to camera plane is $f$ (focal length)

Then, according to the similarity of the triangles,

\frac{Z}{f} = -\frac{X}{{X'}} =-\frac{Y}{{Y'}}
\end{equation}$$
The negative sign indicates that the image is inverted. We can equivalently place the imaging plane symmetrically in front of the camera.
![[attachments/Screenshot 2023-07-09 at 5.46.02 PM.png]]
This removes the negative sign in the formula to make it more compact:
$$\begin{equation}
\frac{Z}{f} = \frac{X}{{X'}} =\frac{Y}{{Y'}}
\end{equation}$$
So we get
$$X' = f\frac{X}{Z}, \quad Y' = f\frac{Y}{Z}$$
#### Pixel Coordinates
To describe how the sensor converts the perceived light into image pixels, we set a pixel plane $o-u-v$ fixed on the physical imaging plane
Between the pixel coordinate system and the imaging plane, there is an apparent zoom and a translation of the origin:
- pixel coordinates scales $\alpha$ times on the $u$ axis and $\beta$ times on $v$
- origin is translated by $[c_x, c_y]^T$
Then, the relationship between the coordinates of $P'$ and the pixel coordinate $[u,v]^T$ is:
$$\begin{equation}
\left\{
\begin{matrix}
u=\alpha X' + c_x\\
v=\beta Y' + c_y
\end{matrix}
\right.
\end{equation}

We then replace $X_{′}$ and $Y_{′}$ with the values found previously:

\left\{
\begin{matrix}
u=f_x\frac{X}{Z} + c_x\\
v=f_y\frac{Y}{Z} + c_y
\end{matrix}
\right. ,
\end{equation}$$
- $f$ is the focal length in meters
- $\alpha$ and $\beta$ is in pixels/meter, so $f_x, f_y$ and $c_x, c_y$ are in pixels.
We can write this as a matrix. Let's put $Z$ to the left side as in most books:
$$\begin{equation} Z \begin{pmatrix} u\\ v\\ 1 \end{pmatrix}= \begin{pmatrix} f_x & 0&c_x \\ 0& f_y& c_y\\ 0&0 & 1 \end{pmatrix}\begin{pmatrix} X\\ Y\\ Z \end{pmatrix} \buildrel \Delta \over = \mathbf{K} \mathbf{P} \end{equation}$$
- Notice that LHS is a [[notes/Homogeneous Coordinate|Homogeneous]] 2D coordinate, while RHS is non-homogeneous (Cartesian coordinates)
- $\mathbf{K}$ denotes the camera's **inner parameter matrix**, so there are 4 parameters: $f_x, f_y, c_x, c_y$
### Related
- [[notes/Camera Extrinsics|Camera Extrinsics]]