# Optimal Value Function

- The optimal value function specifies the best possible performance in the MDP.
- An MDP is “solved” when we know the optimal value function

Definition

The

optimal state-value function$v_{∗}(s)$ is the maximum value function over all policies for all $s∈S$ $v_{∗}(s)=πmax v_{π}(s)$The

optimal action-value function$q_{∗}(s,a)$ is the maximum action-value function over all policies $q_{∗}(s,a)=πmax q_{π}(s,a)$

It is very similar to the original Value Function, but rather than taking the expected values given some policy, we simply take the maximum of the values over our choices.

Backup Diagrams for $q_{∗}$ and $v_{∗}$
**Optimal State-Value Function**
$v_{∗}(s)=amax q_{∗}(s,a)$
$v_{∗}(s)=amax ∑_{s_{′},r}p(s_{′},r∣s,a)(r+γv_{∗}(s_{′}))$

**Optimal Action-Value function**
IMPORTANT: We are taking the average for $q_{∗}(s,a)$ and not the max, because we cannot control what the environment does to us
$q_{∗}(s,a)=∑_{s_{′},r}p(s_{′},r∣s,a)(r+γv_{∗}(s_{′}))$
$q_{∗}(s,a)=∑_{s_{′},r}p(s_{′},r∣s,a)(r+γamax q_{∗}(s_{′},a_{′}))$

Closely related to Optimal Policy. They are solved using Optimal Policy. They are solved using Bellman Equation.