Optimal Policy

Define a partial ordering over policies

Theorem

For any MDP,

  • There exists an optimal policy that is better than or equal to all other policies,
  • All optimal policies achieve the Optimal Value Function,
  • All optimal policies achieve the optimal action-value function,

How to arrive at values? We need to use Bellman Equation, central to solving these.

See Dynamic Programming.