# Earth Mover’s Distance

In statistics, the earth mover’s distance (EMD) is a measure of the distance between two probability distributions over a region $D$.

In general, we use the EMD to quantify the cost of transporting things from $A$ to $B$, and minimize that cost, which is essentially the Optimal Transport problem.

This talk is an interesting idea: Detecting Anomalies Using Statistical Distances | SciPy 2018 | Charles Masson

Also See also Sinkhorn Distance.

### Compute EMD

We can compute the EMD by framing it as an Optimal Transport problem, The EMD can be computed by solving an instance of transportation problem, using any algorithm for minimum-cost flow problem, e.g. the network simplex algorithm.

The Hungarian algorithm can be used to get the solution if the domain $D={0,1}$

As a special case, if $D$ is a one-dimensional array of “bins” of size n, the EMD can be efficiently computed by scanning the array and keeping track of how much dirt needs to be transported between consecutive bins. Here the bins are zero-indexed: $EMD_{0}EMD_{i+1}Total Distance =0=P_{i}+EMD_{i}−Q_{i}=i=0∑n ∣EMD_{i}∣ $

### Optimal Transport Problem

This took me so long to understand how it is calculated, see example calculation here. actually this is way better.

Why EMD vs. simpler metrics like Euclidean Distance? See this Stackoverflow thread.

- Because EMD is more designed to measure distances between distributions
- L1/L2 distance don’t model these points are distributions (when we use K-Means Clustering, we try to find a centroid such that the variance of the cluster is minimized)
- Ex: Visualize the distances between $[1,0,0],[0,1,0],[0,0,1]$. Euclidean Distance would treat them as equivalent, but EMD would not because there is a transport cost associated.

Notes from this youtube video

- https://www.youtube.com/watch?v=CDiol4LG2Ao&ab_channel=AppliedAlgebraicTopologyNetwork
- If we were to use function distances, then $∥f−g∥≈∥f−h∥$, because it doesn’t encode the distance between the distributions.

Geodesics in the Wasserstein Metric space can preserve the shape of the function, whereas a function distance would not preserve the shape.

$D=min{i,j∑ π_{i,j}d(x_{i},y_{i})}$ It’s using the concept of work.

### Serendipity

I first heard about this through the Poker AI project, but apparently it is super used in Poker AI project, but apparently it is super used in Computer Vision: https://link.springer.com/content/pdf/10.1023/A:1026543900054.pdf this paper has been cited 5293 times

Also applied in the world of Natural Language Processing

https://www.youtube.com/watch?v=ymWDGzpQdls&ab_channel=BobLaramee

EMD in linear time : https://www.ibm.com/blogs/research/2019/07/earth-movers-distance/

Computing distance between two points is easy. Computing distance between two areas is more difficult. How would you compute the distance between the US and the UK?

- Using the centroids of the two countries?
- Closest distance?
- Average distance using chunks of the countries?
- Computing distance between two distributions?

We discretize these distributions with a histogram.

EMD is commonly used in computer vision for comparing colour distribution or texture histograms of images for content based image retrieval.

### Related

- Wasserstein GAN