# Visual SLAM

Uses cameras to construct a map of the environment. Does feature matching, see ORB-SLAM Read up on the book: https://github.com/gaoxiang12/slambook-en/blob/master/slambook-en.pdf

- Available locally file:///Users/stevengong/My%20Drive/Books/Coding/slambook-en.pdf

Tesla kind of does SLAM by creating a bird-eyes view from multiple camera views.

Watch this: https://www.youtube.com/watch?v=saVZtgPyyJQ

https://www.kudan.io/blog/camera-basics-visual-slam/

Other resources:

You basically try to figure out where the features align.

Implementation

#### Classical Visual SLAM Stack

A typical visual SLAM workflow includes the following steps:

*Sensor data acquisition*. In visual SLAM, this mainly refers to for acquisition and preprocessing of camera images. For a mobile robot, this will also include the acquisition and synchronization with motor encoders, IMU sensors, etc.- Visual Odometry: VOβs task is to estimate the camera movement between adjacent frames (ego-motion) and generate a rough local map. VO is also known as the frontend.
*Backend filtering/optimization*. The backend receives camera poses at different time stamps from VO and results from loop closing, and then applies optimization to generate a fully optimized trajectory and map. Because it is connected after the VO, it is also known as the backend.- Loop Closing. Loop closing determines whether the robot has returned to its previous position in order to reduce the accumulated drift. If a loop is detected, it will provide information to the backend for further optimization.
- Reconstruction. It constructs a task-specific map based on the estimated camera trajectory.

frontend $β$ more relevant to computer vision topics (image feature extraction and matching) backend $β$ state estimation research area

#### Formalizing

Context

The formalization below will be a little abstract, so here is some more context.

- There are discrete timesteps $1,β¦,k$, at which data sampling happens
- We use $x$ to indicate positions of the robot, so the positions at different time steps can be written as $x_{1},β¦,x_{k}$ (the trajectory of robot)
- The map is made up of several landmarks, and at each time step, the sensors can see a part of the landmarks and record their observations. Assume there is a total of $N$ landmarks in the map, and we will use $y_{1},β¦,y_{N}$ to denote the landmarks.

The are just high level abstract formalizations, see page 17.

**Motion Equation** (this is like the controller, can be obtained from IMU)
$x_{k}=f(x_{kβ1},u_{k},w_{k})$
where

- $x_{k}$ is position at timestep $k$
- $u_{k}$ is the input commands
- $w_{k}$ is noise

**Observation equation** (this comes from the camera)

$z_{k,j}=h(y_{j},x_{k},v_{k,j})$ where

- $z_{k,j}$ is observation data
- $y_{j}$ is a landmark point at $x_{k}$
- $v_{k,j}$ is the noise in this observation

confusion

This abstract equation is kind of confusing. We donβt have $y_{j}$ usually, nor $x_{k}$, I donβt get the point of this equation. The motion equation is much more straightforward. To revisit.

βthe robot sees a landmark point $y_{j}$ at $x_{k}$ and generates an observation data $z_{k,j}$β

${x_{k}=f(x_{kβ1},u_{k},w_{k})z_{k,j}=h(y_{j},x_{k},v_{k,j})βk=1,β¦,K(k,j)βOβ$

These two equations together describe a basic SLAM problem: how to solve the estimate $x$ (localization) and $y$ (mapping) problem with the noisy control input $u$ and the sensor reading $z$ data?

Now, as we see, we have modelled the SLAM problem as a State Estimation problem: How to estimate the internal, hidden state variables through the noisy measurement data?

### Example

Depending on the actual motion and the type of sensor, there are several kinds of parameterization methods. What is parameterization?

**Motion equation** example
For example, suppose our robot moves in a plane, then its pose is described by two $xβy$ coordinates and an angle, i.e., $x_{k}=[x_{1},x_{2},ΞΈ]_{k}$, where $x_{1},x_{2}$ are positions on two axes and $ΞΈ$ is the angle. At the same time, the input command is the position and angle change between the time interval: $u_{k}=[Ξx_{1},Ξx_{2},ΞΞΈ]_{k}$, so the motion equation can be parameterized as: