Model-Free Control

Temporal-Difference Control

  • On-Policy TD Control is known as Sarsa
  • Off-Policy TD Control is known as Q-Learning

TD Learning offers several advantages over MC:

  • Lower Variance
  • Online
  • Can handle incomplete sequences (MC needs a full episode run)