Bundle adjustment in SLAM is an optimization technique. It refines estimates of camera poses and 3D points in the map to minimize the re-projection error.

Bundle adjustment is a least squares solution for orienting images and estimating 3D point locations.

• SLAM: Creates a map of an unknown environment while also tracking the agentβs position within it.
• Bundle Adjustment: Optimizes the estimated camera poses and 3D points simultaneously.

Resources

Implementations

Is BA the bottleneck of the SLAM pipeline?

Bottleneck is still the data association according to Cyrill.

Bundle βblockβ adjustment: multiple images are corrected βen blocβ

Key idea of BA

2. Project the estimated 3D points into the estimated camera images
3. Compare locations of the projected 3D points with measured (2D) ones
4. Adjust to minimize error in the images

### Notes from SLAM Textbook

Get a refresher from the SLAM Formalization.

We have the motion and observation equations

• motion data
• observation data
• first subscript is for the timestamp, second subscript is for the -th landmark
• state (you donβt have this)
• landmarks (you donβt have this)

What are the functions and ?

• Both are just transforms

Goal: Given , find over time.

We want to figure out the ground truth poses and landmarks data. Assume that the variables follow a Normal Distribution, so our goal is to figure out the probability distribution of given the data:

The text

This is a MLE problem, but I havenβt seen the multivariate version from my statistics class. We use Bayes Rule:

Reminders

It is normally difficult to find the posterior distribution directly (in nonlinear systems), but it is feasible to find an optimal point which maximize the posterior (MAP):

The MLE problem is given by

• We just get rid of , because we donβt have a prior estimate

βWhat state, it is most likely to produce the data currently observed?β

So how can we do this? We can actually rewrite the probability equation:

• Remember that
• We can further simplify. Control input is independent of landmark point, it only is dependent on position so

We can handle movement and observation independently. We define the error terms as as

e_{z,j,k} &= z_{k,j} - h(x_{k}, y_j) \end{align}$$We know that if we sample z_{k,j}, we know that z_{k,j}= h(x_k, y_j) + v_{k,j}. The probability of the distribution is$$P(z_{k,j}|x_k, y_j) = N(h(x_k, y_j), Q_{k,j})$$Solving the MLE problem we have$$\begin{align} (x_k, y_j)^* &= argmax P(z_{k,j}|x_k, y_j) \\ &= argmax N(h(x_k,y_j), Q_{k,j}) \\ &= argmin_{} P() \end{align} $$our goal is to minimize this error terms. We get$$\min J(x,y) = \sum_k e^T_{u,k} R_k ^{-1} e_{u,k} + \sum_k \sum_j e^T_{z,k,j} Q^{-1}_{k,j}e_{z,k,j}$$- A least-square problem is obtained with he same solution as the MLE problem #### Bundle Adjustment Chapter 8 ![[attachments/Screenshot 2024-05-09 at 11.08.55 AM.png]] We write out the error$$e = z - h(T,p)$$- e is also called the [[notes/Residual|Residual]] The overall cost function is$$\frac{1}{2}\sum_{i=1}^m \sum_{j=1}^n \| e_{ij}\|^2 = \frac{1}{2}\sum_{i=1}^m \sum_{j=1}^n \| z_{ij} - h(T_i, p_j)\|^2$$Solving this is equivalent to solving pose T and landmarks p at the same time. This is BA. We have the state$$x = [T_1, \dots, T_m,p_1, \dots, p_n]^T Solve with [[notes/g2o|g2o]]. #### Chapter 9 of Bundle Adjustment This chapter talks about real-time SLAM. In the previous chapter, we saw the basic BA setup. However, to run in real-time, you end up having more and more landmarks, it becomes computationally WAY too expensive. I was having trouble wrapping my head around why VO is less accurate than BA on multiple keyframes, since the constraints are only between two different poses. - NOPE, you actually want to track the landmarks over multiple keyframes. #### Sliding Window Method Therefore, we limit the calculations of the backend. We use a local map. Very simple fix. Keep only $N$ keyframes closest to the current moment, and remove the earlier ones. Solve BA with only those keyframes. Co-visibility graph If there are only adjacent keyframe constraints, we cannot do much, nor can we eliminate the accumulated error. waiitt, yea, in this pose graph, the constraints are really for adjacent graphs. ### Notes from Cyrill Stachniss ![[attachments/Screenshot 2023-10-25 at 9.37.39 AM.png]] > [!question] Why is the correction done for the 2D point, and not the 3D point? > > > ?? Unknowns: - 3D locations of new points - 1D scale factor - 6D exterior orientation - 5D projection parameters (interior orientation) - Non-linear distortion parameter In the end, you don't really need the scale factor. ![[attachments/Screenshot 2023-10-25 at 9.50.38 AM.png]] ![[attachments/Screenshot 2023-10-25 at 9.50.47 AM.png]] ![[attachments/Screenshot 2023-10-25 at 9.52.52 AM.png]] Are the control points noisy or noise-free? ### Initial Guess Having a good initial guess is key. How are we supposed to obtain the initial guess? - Compute [[notes/Relative Orientation|Relative Orientation]] from the first image pair - Compute orientation for subsequent images through P3P/RRS Critical issues - Gross error handling is essential - Requires enough new points in each image overlap - No singular/critical configurations ### Gross Errors / Outliers Reasons for gross errors 1. Wrong correspondences 2. Wrong point measurements Observations per point - At least 4 views to identify the observation with a gross error - Observed points from 5 to 6 different views typically yield good estimates > [!tip] Eliminating Gross Errors > > > - Decompose the problem into several small blocks of 3-6 images first > - Check for gross errors in the small blocks using statistical tests > - Only consider features that can be tracked consistently in all 3-6 images > - Relative orientation (5-Point algo.) combined with RANSAC > - Eliminate gross errors, then run full BA ### Robust Kernels