Value-Based Methods
This is most of what I’ve learned through David Silver’s course. They’re all based off the Bellman Equation, and using Generalized Policy Iteration to improve policies through value functions.
Methods:
- Q-Learning (off-policy)
- Sarsa (on-policy)