Visual SLAM

SLAM Formalization

Formalizing

Context

The formalization below will be a little abstract, so here is some more context.

  • There are discrete timesteps , at which data sampling happens
  • We use to indicate positions of the robot, so the positions at different time steps can be written as (the trajectory of robot)
  • The map is made up of several landmarks, and at each time step, the sensors can see a part of the landmarks and record their observations. Assume there is a total of landmarks in the map, and we will use to denote the landmarks.

The are just high level abstract formalizations, see page 17.

Motion Equation where

  • is position at timestep
  • is the input commands
  • is noise, we assume guassian

Observation equation where

  • is observation data
  • is a landmark point at
  • is the noise in this observation, we assume gaussian

On the obervation equation

“the robot sees a landmark point at and generates an observation data

Confusion

I had trouble wrapping my head around the observation equation: -The sensor measurement (dependent variable) is calculated from the landmark position and position - This looks counterintuitive, because we know , and it should be independent (on the right side, not left side)

ChatGPT: In modeling context, it’s dependent because it is the result of the underlying system state ​ and the landmark positions . This setup helps in developing algorithms to estimate the unknown states ​ from the known observations

  • If you look at the way Kalman Filters are formalized, it’s the same thing

The basic SLAM problem

These two equations together describe a basic SLAM problem: given with the noisy control input and the sensor reading data, how to estimate (localization) and (mapping) problem?

So the 4 important variables:

  • the pose of the camera in world frame
  • the pose of the landmarks in world frame
  • the pose of the camera in odom frame
  • the pose of the landmarks in camera frame (need to transform it into odom frame)

Now, as we see, we have modelled the SLAM problem as a State Estimation problem: How to estimate the internal, hidden state variables through the noisy measurement data?

Example

Depending on the actual motion and the type of sensor, there are several kinds of parameterization methods. What is parameterization?

Motion equation example For example, suppose our robot moves in a plane, then its pose is described by two coordinates and an angle, i.e., , where are positions on two axes and is the angle. At the same time, the input command is the position and angle change between the time interval: , so the motion equation can be parameterized as:

{\left[ \begin{array}{l} x_1\\ x_2\\ \theta \end{array} \right]_k} = {\left[ \begin{array}{l} x_1\\ x_2\\ \theta \end{array} \right]_{k - 1}} + {\left[ \begin{array}{l} \Delta x_1\\ \Delta x_2\\ \Delta \theta \end{array} \right]_k} + {\mathbf{w}_k}, \end{equation}$$ where $\mathbf{w}_k$ is the noise again. This is a simple linear relationship. However, not all input commands are position and angular changes. For example, the input of "throttle" or "joystick" is the speed or acceleration, so there are other forms of more complex motion equations. At that time, we would say the kinematic analysis is required. **Observation equation** example Imagine that the robot carries a two-dimensional laser sensor. We know that a laser observes a 2D landmark by measuring two quantities: the distance $r$ between the landmark point and the robot, and the angle $\phi$. Let's say the landmark is at $\mathbf{y}_j = [y_1, y_2]_j^\mathrm{T}$, the pose is $\mathbf{x}_k=[x_1,x_2]_k^\mathrm{T}$, and the observed data is $\mathbf{z}_{k,j} = [r_{k,j}, \phi_{k,j}]^\mathrm{T}$, then the observation equation is written as: $$\begin{equation} \left[ \begin{array}{l} r_{k,j}\\ \phi_{k,j} \end{array} \right] = \left[ \begin{array}{l} \sqrt {{{\left(y_{1,j} - x_{1,k} \right)}^2} + {{\left( {{y_{2,j}} - x_{2,k} } \right)}^2}} \\ \arctan \left( \frac{{y_{2,j}} - x_{2,k}}{{y_{1,j} - x_{1,k}}} \right) \end{array} \right] + \mathbf{v}_{k, j}. \end{equation}$$ When considering visual SLAM, the sensor is a camera, then the observation equation is a process like "getting the pixels in the image of the landmarks."