🛠️ Steven Gong

Search

Advantage-Weighted Regression (AWR)
Related

Jul 09, 2025, 1 min read

Advantage-Weighted Regression (AWR)

Saw this from the online batch RL paper.

$θ^{*} = ar g max_{θ} E_{(s, a) \sim D} [e^{β (Q (s, a) - V (s))} lo g π_{θ} (a ∣ s)]$

So instead of naive BC, we reweigh the dataset based on this advantage

AWR learns a policy pi(a∣s) by supervised learning on a dataset of (s,a) pairs, but weights each action by its advantage:

https://arxiv.org/abs/1910.00177

Learn a policy that imitates actions with high advantage, and suppresses actions with low advantage.

Related

Policy Extraction

Graph View

Backlinks

Behavior Cloning (BC)
Importance Sampling

Created with Quartz, © 2025

Blog
LinkedIn
Twitter
GitHub