Define a partial ordering over policies
For any MDP,
- There exists an optimal policy that is better than or equal to all other policies,
- All optimal policies achieve the Optimal Value Function,
- All optimal policies achieve the optimal action-value function,
How to arrive at values? We need to use Bellman Equation, central to solving these.
See Dynamic Programming.