Sarsa is the implementation of On-Policy TD Control.
It is very similar to the idea we do with Monte-Carlo Control, (replace policy evaluation with TD-Learning, use the same policy improvement method with epsilon greedy).
Pseudocode
Sarsa ($lambda$)
Okay, I don’t understand this, just like I don’t quite understand TD(λ).