Model-Based Policy Evaluation
Making this page because I keep getting confused.
Okay so Policy Evaluation is confusing to me because it’s not that we are necessarily evaluating the policy, but rather looking at the value function and compute the accurate .
The way we do this is through Dynamic Programming, because we have full knowledge of the MDP.
So we use the Bellman Equations and create backups, though I have yet to do that in code. But you can solve the problem exactly if you know the MDP.