Distance Metric
Defining the distance metric is very important to have a good performing model.
https://medium.com/analyticsvidhya/roleofdistancemetricsinmachinelearninge43391a6bf2e
Suppose two objects $x$ and $y$ both have $p$ features $x=(x_{1},x_{2},…,x_{p})$ $y=(y_{1},y_{2},…,y_{p})$
From Lecture 13of Carnegie. The Minskowski metric/distance is defined by $d(x,y)=ri=1∑p ∣x_{i}−y_{i}∣_{r} $
This is a generalization of two Distance Metrics you are very familiar with in ML:
 L1 Distance (Manhattan Distance) ($r=1$)
 L2 Distance (Euclidean Distance) ($r=2$)
 Chebyshev Distance “sup” distance ($r=∞$)
Other Distance metrics

Edit Distance: General technique for measuring similarity, where we look at the amount of effort to transform one object into another. OH, Earth Mover’s Distance is an edit distance metric then?
Properties that every distance metric should have
 Symmetry: $D(A,B)=D(B,A)$
 Otherwise you can claim that Alex looks like Bob, but Bob looks nothing like Alex
 Constancy of SelfSimilarity $D(A,A)=0$,
 Otherwise Alex looks more like Bob, that Bob does
 Positivity Separation $D(A,B)=0⟺A=B$
 Otherwise if there are objects in your world that are different, but you cannot tell apart
 Triangle Inequality $D(A,B)≤D(A,C)+D(B,C)$
 Otherwise you could claim Alex is very like Bob, and Alex is very like Carl, but Bob is very unlike Carl
I also see nonnegativity on wikipedia:
 NonNegativity: $D(A,B)≥0$
Examples
Taken from wikipedia
Metrics
 Total variation distance (sometimes just called “the” statistical distance)
 Hellinger distance
 Lévy–Prokhorov metric
 Wasserstein metric: also known as the Kantorovich metric, or earth mover’s distance
 Mahalanobis Distance
Divergences
 KullbackLeibler Divergence
 Rényi’s divergence
 Jensen–Shannon divergence
 Bhattacharyya distance (despite its name it is not a distance, as it violates the triangle inequality)
 fdivergence: generalizes several distances and divergences
 Discriminability index, specifically the Bayes discriminability index is a positivedefinite symmetric measure of the overlap of two distributions.