Entropy (Information Theory)

Entropy quantifies uncertainty.

In information theory, the entropy of a random variable is the average level of “information”, “surprise”, or “uncertainty” inherent to the variable’s possible outcomes.

Given a discrete random variable , which takes values in the alphabet and is distributed according to :

Entropy = measure of uncertainty over random Variable X = number of bits required to encode X (on average)

  • The more spread out your data, the higher the entropy.

Question

How is the entropy formula derived??

Comparison

Entropy in information theory is directly analogous to the entropy in statistical thermodynamics. The analogy results when the values of the random variable designate energies of microstates, so Gibbs formula for the entropy is formally identical to Shannon’s formula. Entropy has relevance to other areas of mathematics such as combinatorics and machine learning. The definition can be derived from a set of axioms establishing that entropy should be a measure of how “surprising” the average outcome of a variable is. For a continuous random variable, differential entropy is analogous to entropy.

From Sinkhorn Divergence

I have yet to fully understand this. It seems to be something with making the function Convex?

The entropy of a matrix is given by Low entropy = sparse matrix, i.e. most of non-zero values c are concentrated in a few points. The lower the entropy, the closer we are approximating to the original solution for the EMD.