Semi-Gradient Algorithms

In contrast with Residual Algorithms. I first saw this in this reddit post when trying to understand the difference between using the MSBE and the raw Bellman Optimality Backup (answer: you can’t use the latter for a gradient update… it’s only for tabular setting).

Semi-gradient methods use the same squared error but treat the target as fixed, i.e if you look at the MSBE loss: $L (ϕ, D) = (s, a, r, s^{'}, d) \sim D E (Q_{ϕ} (s, a) - (r + γ (1 - d) max_{a^{'}} Q_{ϕ_{t a r g e t}} (s^{'}, a^{'})))^{2}$

We use $Q_{ϕ_{t a r g e t}}$ , which is a fixed target

Residual: “I’ll try to fully minimize the Bellman error, including how my parameters affect the next-state value.”

Semi-gradient: “I’ll only match my current value to a fixed backup target, pretending the target doesn’t move with me.”

🛠️ Steven Gong

Semi-Gradient Algorithms

Graph View

Backlinks