Maximum Entropy

This is actually super useful and practical, because the world is full of Uncertainty.

The Entropy is given by $H (X) = - \sum_{x \in X} p (x) lo g p (x)$

This is another way to formulate. To take into account uncertainty, so for Robustness.

$V = max_{π (a)} (E [r (a)] + β H (π (a)))$

We use Constrained Optimization to come up with a set of equations.

This is a really important derivation

ahh you want to maximize entropy so that the policy is not as deterministic

Max-entropy Value Iteration $V_{k} (s) = max_{π} (E [R (s, a) + V_{k - 1} (s^{'})] + β H (π (a ∣ s)))$

🛠️ Steven Gong