Entropy (Information Theory)

Not to be confused with Entropy (Thermodynamics), but they are directly analogous (more below).

Entropy quantifies uncertainty.

In information theory, the entropy of a random variable is the average level of “information”, “surprise”, or “uncertainty” inherent to the variable’s possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to :

Entropy = measure of uncertainty over random Variable X = number of bits required to encode X (on average)

H(X) &=-\sum _{x\in X}p(x)\log p(x)\end{aligned}$$ ![[attachments/Screen Shot 2022-12-23 at 2.48.22 PM.png]] - The more spread out your data, the higher the entropy. > [!question] Question > > > How is the entropy formula derived?? ### Comparison Entropy in information theory is directly analogous to the entropy in statistical thermodynamics. The analogy results when the values of the random variable designate energies of microstates, so Gibbs formula for the entropy is formally identical to Shannon's formula. Entropy has relevance to other areas of mathematics such as combinatorics and machine learning. The definition can be derived from a set of axioms establishing that entropy should be a measure of how "surprising" the average outcome of a variable is. For a continuous random variable, differential entropy is analogous to entropy. ### From [[notes/Sinkhorn Distance|Sinkhorn Divergence]] I have yet to fully understand this. It seems to be something with making the function [[notes/Convex Optimization|Convex]]? The entropy of a matrix is given by $$H(P) = - \sum_{ij} P_{ij} \log P_{ij}$$ Low entropy = sparse matrix, i.e. most of non-zero values c are concentrated in a few points. The lower the entropy, the closer we are approximating to the original solution for the [[notes/Earth Mover Distance|EMD]]. ### Related - [[notes/Maximum Entropy|Maximum Entropy]] - [[notes/Constrained Optimization|Constrained Optimization]]