Nonlinear Optimization Least Squares
NonLinear Least Squares
A nonlinear version of the least squares problem, where the error term $e(x)$ is nonlinear.
Options to solve:
 Ceres Solver http://ceressolver.org/
 G2o https://github.com/RainerKuemmerle/g2o
 Eigen?
 Sophus
 CuSOLVER https://docs.nvidia.com/cuda/cusolver/index.html#usingthecusolverapi
 Found from a quick google search https://github.com/Rookfighter/leastsquarescpp
Notes from Cyrill Stachniss course.
I find the notes from the SLAM Textbook to be really good.
The goal is to minimize the function $min_{x}F(x)=21ββ₯f(x)β₯_{2}$
We solve it iteratively in the following steps:
Abstract
 Give an initial value $x_{0}$.
 For $k$th iteration, we find an incremental value of$Ξx_{k}$, such that the object function $β₯f(x_{k}+Ξx_{k})β₯_{2}$ reaches a smaller value.
 If $Ξx_{k}$ is small enough, stop the algorithm.
 Otherwise, let $x_{k+1}=x_{k}+Ξx_{k}$ and return to step 2.
 This feels very much like Gradient Descent. How do we find $Ξx_{k}$? In gradient descent, itβs just some step size multiplied by the gradient of the function
Take the Taylor Polynomial expansion of $F$: $F(x_{k}+Ξx_{k})=F(x_{k})+J(x_{k})_{T}Ξx_{k}+21βΞx_{k}H(x_{k})Ξx_{k}$
 $J(x_{k})$ is the first derivative of $F(x)$ (called the Jacobian Matrix)
 $H(x_{k})$ is the second derivative of $F(x)$ (called the Hessian Matrix)
Note
Depending on whether you take $J$ or $H$, you end up with firstorder method (Gradient Descent) or secondorder method (Newtonβs Method).
 Gradient descent is volatile due to having to determine the step size correctly, does not guarantee convergence
 Newtonβs method is computationally expensive
The book introduces two quasiNewton methods to avoid calculating $H$ directly:
Practical advice
We usually choose one of the GaussNewton or LevenbergMarquardt methods as the gradient descent strategy in practical problems.
 When the problem is wellformed, GaussNewton is used
 Otherwise, in illformed problems, we use the LevenbergMarquardt method
In SLAM Context
In the context of SLAM, we want to minimize the error between the current observation and the prediction.
So how is F(x) defined? $F(x)=β_{i}e_{i}(x)$
where $e_{i}(x)=z_{i}βf_{i}(x)$

so how are those values actually computed?

$z_{i}$ is the 3d coordinate of the point point example

$f_{i}(x)$ is the project of the 2d coordinate onto the 3d plane
What is this equation? $e_{i}(x)=e_{i}(x)_{T}\ohm_{i}e_{i}(x)$
 Depends only on the state