Non-Linear Least Squares

A non-linear version of the least squares problem, where the error term $e (x)$ is non-linear.

Options to solve:

Ceres Solver http://ceres-solver.org/
G2o https://github.com/RainerKuemmerle/g2o
Eigen?
Sophus
CuSOLVER https://docs.nvidia.com/cuda/cusolver/index.html#using-the-cusolver-api
Found from a quick google search https://github.com/Rookfighter/least-squares-cpp

Notes from Cyrill Stachniss course.

I find the notes from the SLAM Textbook to be really good.

The goal is to minimize the function $min_{x} F (x) = \frac{1}{2} ∥ f (x) ∥_{2}^{2}$

We solve it iteratively in the following steps:

Abstract

Give an initial value $x_{0}$ .

For $k$ -th iteration, we find an incremental value of $Δ x_{k}$ , such that the object function $∥ f (x_{k} + Δ x_{k}) ∥_{2}^{2}$ reaches a smaller value.

If $Δ x_{k}$ is small enough, stop the algorithm.

Otherwise, let $x_{k + 1} = x_{k} + Δ x_{k}$ and return to step 2.

This feels very much like Gradient Descent. How do we find $Δ x_{k}$ ? In gradient descent, it’s just some step size multiplied by the gradient of the function

Take the Taylor Polynomial expansion of $F$ : $F (x_{k} + Δ x_{k}) = F (x_{k}) + J (x_{k})^{T} Δ x_{k} + \frac{1}{2} Δ x_{k}^{T} H (x_{k}) Δ x_{k}$

$J (x_{k})$ is the first derivative of $F (x)$ (called the Jacobian Matrix)
$H (x_{k})$ is the second derivative of $F (x)$ (called the Hessian Matrix)

Note

Depending on whether you take $J$ or $H$ , you end up with first-order method (Gradient Descent) or second-order method (Newton’s Method).

Gradient descent is volatile due to having to determine the step size correctly, does not guarantee convergence
Newton’s method is computationally expensive

The book introduces two quasi-Newton methods to avoid calculating $H$ directly:

Practical advice

We usually choose one of the Gauss-Newton or Levenberg-Marquardt methods as the gradient descent strategy in practical problems.

When the problem is well-formed, Gauss-Newton is used

Otherwise, in ill-formed problems, we use the Levenberg-Marquardt method

In SLAM Context

In the context of SLAM, we want to minimize the error between the current observation and the prediction.

So how is F(x) defined? $F (x) = \sum_{i} e_{i} (x)$

where $e_{i} (x) = z_{i} - f_{i} (x)$

so how are those values actually computed?
$z_{i}$ is the 3d coordinate of the point point example
$f_{i} (x)$ is the project of the 2d coordinate onto the 3d plane

What is this equation? $e_{i} (x) = e_{i} (x)^{T} \ohm_{i} e_{i} (x)$

Depends only on the state

🛠️ Steven Gong

Table of Contents

Non-Linear Least Squares

In SLAM Context

Graph View

Backlinks