Temporal-Difference Control
- On-Policy TD Control is known as Sarsa
- Off-Policy TD Control is known as Q-Learning
TD Learning offers several advantages over MC:
- Lower Variance
- Online
- Can handle incomplete sequences (MC needs a full episode run)
TD Learning offers several advantages over MC: