🛠️ Steven Gong

Search

Aug 16, 2025, 1 min read

A Minimalist Approach to Offline Reinforcement Learning (TD3 + BC)

This paper’s core contribution is just add the BC loss to DDPG, and shows great improvement, i.e.

$π = argmax_{π} E_{(s, a) \sim D} [λ Q (s, π (s)) - (π (s) - a)^{2}]$

Shown in the Is Value Learning Really the Main Bottleneck in Offline RL.

Graph View

Backlinks

Imitation Learning
Is Value Learning Really the Main Bottleneck in Offline RL?

Created with Quartz, © 2026

Blog
LinkedIn
Twitter
GitHub