Semi-Gradient Algorithms

In contrast with Residual Algorithms. I first saw this in this reddit post when trying to understand the difference between using the MSBE and the raw Bellman Optimality Backup (answer: you can’t use the latter for a gradient update… it’s only for tabular setting).

Semi-gradient methods use the same squared error but treat the target as fixed, i.e if you look at the MSBE loss:

  • We use , which is a fixed target

Residual: “I’ll try to fully minimize the Bellman error, including how my parameters affect the next-state value.”

Semi-gradient: “I’ll only match my current value to a fixed backup target, pretending the target doesn’t move with me.”