Policy Evaluation

Problem: Evaluate a given policy $π$ Solution: Iterative application of Bellman Expectation Backup

$v_{1} \to v_{2} \to ... \to v_{\infty} = v_{π}$
Using synchronous backups,
- At each iteration, for all states $s \in S$ Update $v_{k + 1} (s)$ from $v_{k} (s^{'})$ , where $s^{'}$ is a successor state of $s$ $v_{k + 1} (s) = \sum_{a} π (a ∣ s) \sum_{s^{'}, r} p (s^{'}, r ∣ s, a) (r + γ v_{k} (s^{'}))$
Make sure to know this equation by heart (bellman expectation backup), and the difference with the bellman optimality equation
You should also know for the deterministic Value Function

Pseudocode

Question: from the stanford thing, they tell us to initialize $v (s)$ to 0, but here its random except the terminal value function is also 0. Is it just for faster convergence?#gap-in-knowledge

Policy Improvement
Generalized Policy Iteration

🛠️ Steven Gong

Table of Contents

Policy Evaluation

Pseudocode

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

Policy Evaluation

Pseudocode

Related

Graph View

Backlinks