Visual Odometry (VO)

This is fundamental knowledge for me to implement SLAM.

NVIDIA does it through the Elbus library (renamed to CuVSLAM)

VO’s task is to estimate the camera movement between adjacent frames (ego-motion) and generate a rough local map. VO is also known as the frontend.


Why isn't visual odometry enough? Why do we need SLAM?

Because there can be errors that compound. VO only estimates the motion between two consecutive frames. If there is error by VO, this error is going to accumulate over time. And you will end up with the wrong orientaiton.

From the Visual SLAM book, there 2 different methods to do VO:

  1. Feature method
  2. Direct method

Some repos: