Visual SLAM
Uses cameras to construct a map of the environment. Does feature matching, see ORB-SLAM Read up on the book: https://github.com/gaoxiang12/slambook-en/blob/master/slambook-en.pdf
- Available locally file:///Users/stevengong/My%20Drive/Books/Coding/slambook-en.pdf
Tesla kind of does SLAM by creating a bird-eyes view from multiple camera views.
Watch this: https://www.youtube.com/watch?v=saVZtgPyyJQ
https://www.kudan.io/blog/camera-basics-visual-slam/
Other resources:
You basically try to figure out where the features align.
Implementation
Classical Visual SLAM Stack
A typical visual SLAM workflow includes the following steps:
- Sensor data acquisition. In visual SLAM, this mainly refers to for acquisition and preprocessing of camera images. For a mobile robot, this will also include the acquisition and synchronization with motor encoders, IMU sensors, etc.
- Visual Odometry: VOβs task is to estimate the camera movement between adjacent frames (ego-motion) and generate a rough local map. VO is also known as the frontend.
- Backend filtering/optimization. The backend receives camera poses at different time stamps from VO and results from loop closing, and then applies optimization to generate a fully optimized trajectory and map. Because it is connected after the VO, it is also known as the backend.
- Loop Closing. Loop closing determines whether the robot has returned to its previous position in order to reduce the accumulated drift. If a loop is detected, it will provide information to the backend for further optimization.
- Reconstruction. It constructs a task-specific map based on the estimated camera trajectory.
frontend more relevant to computer vision topics (image feature extraction and matching) backend state estimation research area
Formalizing
Context
The formalization below will be a little abstract, so here is some more context.
- There are discrete timesteps , at which data sampling happens
- We use to indicate positions of the robot, so the positions at different time steps can be written as (the trajectory of robot)
- The map is made up of several landmarks, and at each time step, the sensors can see a part of the landmarks and record their observations. Assume there is a total of landmarks in the map, and we will use to denote the landmarks.
The are just high level abstract formalizations, see page 17.
Motion Equation (this is like the controller, can be obtained from IMU) where
- is position at timestep
- is the input commands
- is noise
Observation equation (this comes from the camera)
where
- is observation data
- is a landmark point at
- is the noise in this observation
confusion
This abstract equation is kind of confusing. We donβt have usually, nor , I donβt get the point of this equation. The motion equation is much more straightforward. To revisit.
βthe robot sees a landmark point at and generates an observation data β
These two equations together describe a basic SLAM problem: how to solve the estimate (localization) and (mapping) problem with the noisy control input and the sensor reading data?
Now, as we see, we have modelled the SLAM problem as a State Estimation problem: How to estimate the internal, hidden state variables through the noisy measurement data?
Example
Depending on the actual motion and the type of sensor, there are several kinds of parameterization methods. What is parameterization?
Motion equation example For example, suppose our robot moves in a plane, then its pose is described by two coordinates and an angle, i.e., , where are positions on two axes and is the angle. At the same time, the input command is the position and angle change between the time interval: , so the motion equation can be parameterized as: