🛠️ Steven Gong

Search

Jul 17, 2025, 1 min read

Bellman Error

$L (ϕ, D) = (s, a, r, s^{'}, d) \sim D E (Q_{ϕ} (s, a) - (r + γ (1 - d) max_{a^{'}} Q_{ϕ} (s^{'}, a^{'})))^{2}$

https://spinningup.openai.com/en/latest/algorithms/ddpg.html

Graph View

Backlinks

Offline Reinforcement Learning

Created with Quartz, © 2025

Blog
LinkedIn
Twitter
GitHub