# Policy Evaluation

**Problem**: Evaluate a given policy $π$
**Solution**: Iterative application of Bellman Expectation Backup

- $v_{1}→v_{2}→...→v_{∞}=v_{π}$
- Using synchronous backups,
- At each iteration, for all states $s∈S$ Update $v_{k+1}(s)$ from $v_{k}(s_{′})$, where $s_{′}$ is a successor state of $s$ $v_{k+1}(s)=∑_{a}π(a∣s)∑_{s_{′},r}p(s_{′},r∣s,a)(r+γv_{k}(s_{′}))$

- Make sure to know this equation by heart (bellman expectation backup), and the difference with the bellman optimality equation
- You should also know for the deterministic Value Function

#### Pseudocode

Question: from the stanford thing, they tell us to initialize $v(s)$ to 0, but here its random except the terminal value function is also 0. Is it just for faster convergence?#gap-in-knowledge