Optimal Value Function

The optimal value function specifies the best possible performance in the MDP.
An MDP is “solved” when we know the optimal value function

Definition

The optimal state-value function $v_{*} (s)$ is the maximum value function over all policies for all $s \in S$ $v_{*} (s) = π max v_{π} (s)$

The optimal action-value function $q_{*} (s, a)$ is the maximum action-value function over all policies $q_{*} (s, a) = π max q_{π} (s, a)$

It is very similar to the original Value Function, but rather than taking the expected values given some policy, we simply take the maximum of the values over our choices.

Backup Diagrams for $q_{*}$ and $v_{*}$ Screen Shot 2021-12-12 at 1.48.37 PM.png Optimal State-Value Function $v_{*} (s) = a max q_{*} (s, a)$ $v_{*} (s) = a max \sum_{s^{'}, r} p (s^{'}, r ∣ s, a) (r + γ v_{*} (s^{'}))$

Optimal Action-Value function IMPORTANT: We are taking the average for $q_{*} (s, a)$ and not the max, because we cannot control what the environment does to us $q_{*} (s, a) = \sum_{s^{'}, r} p (s^{'}, r ∣ s, a) (r + γ v_{*} (s^{'}))$ $q_{*} (s, a) = \sum_{s^{'}, r} p (s^{'}, r ∣ s, a) (r + γ a max q_{*} (s^{'}, a^{'}))$

Closely related to Optimal Policy. They are solved using Optimal Policy. They are solved using Bellman Equation.

🛠️ Steven Gong

Optimal Value Function

Graph View

Backlinks