Kullback-Leibler Divergence
Saw this word from here: https://spinningup.openai.com/en/latest/algorithms/trpo.html
This seems to be super used as a way to measure the distances between two distributions in the Machine Learning world. However, there seems to be a recent interest in computing the Sinkhorn Divergence instead.
Interesting, they also talk bout this idea in F1TENTH for Particle Filter.
Also hearing about it while learning about EMD.
Notation from here. The Kullback-Leibler divergence from the distribution to the distribution is defined as
where and are the respective densities of and .
https://dfdazac.github.io/sinkhorn.html âIt can be shown1 that minimizing  is equivalent to minimizing the Negative Log Likelihood, which is what we usually do when training a classifier, for example. In the case of the Variational Autoencoder, we want the approximate posterior to be close to some prior distribution, which we achieve, again, by minimizing the KL divergence between them.
âoften dubbed Cross-Entropy Loss in the Deep Learning contextâ from here
It can be shown that solving the Maximum Likelihood Estimation problem is equivalent to minimizing the Kullback-Leibler divergence.