Distance Metric

Defining the distance metric is very important to have a good performing model.


Suppose two objects and both have features

From Lecture 13of Carnegie. The Minskowski metric/distance is defined by

This is a generalization of two Distance Metrics you are very familiar with in ML:

Other Distance metrics

Properties that every distance metric should have

  • Symmetry:
    • Otherwise you can claim that Alex looks like Bob, but Bob looks nothing like Alex
  • Constancy of Self-Similarity ,
    • Otherwise Alex looks more like Bob, that Bob does
  • Positivity Separation
    • Otherwise if there are objects in your world that are different, but you cannot tell apart
  • Triangle Inequality
    • Otherwise you could claim Alex is very like Bob, and Alex is very like Carl, but Bob is very unlike Carl

I also see non-negativity on wikipedia:

  • Non-Negativity:


Taken from wikipedia


  • Total variation distance (sometimes just called “the” statistical distance)
  • Hellinger distance
  • Lévy–Prokhorov metric
  • Wasserstein metric: also known as the Kantorovich metric, or earth mover’s distance
  • Mahalanobis Distance


  • Kullback-Leibler Divergence
  • Rényi’s divergence
  • Jensen–Shannon divergence
  • Bhattacharyya distance (despite its name it is not a distance, as it violates the triangle inequality)
  • f-divergence: generalizes several distances and divergences
  • Discriminability index, specifically the Bayes discriminability index is a positive-definite symmetric measure of the overlap of two distributions.