🛠️ Steven Gong

Search

Dec 24, 2022, 1 min read

Optimal Policy

Define a partial ordering over policies $π \geq π^{'} if v_{π} (s) \geq v_{π^{'}} (s), \forall s$

Theorem

For any MDP,

There exists an optimal policy $π_{*}$ that is better than or equal to all other policies, $π_{*} \geq π, \forall π$

All optimal policies achieve the Optimal Value Function, $v_{π_{*}} (s) = v_{*} (s)$

All optimal policies achieve the optimal action-value function, $q_{π_{*}} (s, a) = q_{*} (s, a)$

How to arrive at $q_{*}$ values? We need to use Bellman Equation, central to solving these.

See Dynamic Programming.

Evaluation and Control

Graph View

Backlinks

Counterfactual Regret Minimization (CFR)
Markov Decision Process (MDP)
Model-Free Control
Off-Policy Methods
Optimal Value Function
Policy
Principle of Optimality

Created with Quartz, © 2025

Blog
LinkedIn
Twitter
GitHub