🛠️ Steven Gong

Search

Jul 21, 2025, 1 min read

Policy Gradient Methods

Actor-Critic Methods

The point of actor critic methods is to decouple the gradient update from the q-function update.

Policy-based methods directly optimize the policy but have high variance
Value-based methods estimate values but don’t give a policy

Methods:

DDPG
Soft Actor-Critic

Graph View

Backlinks

CS287 - Advanced Robotics
Policy Gradient Methods
Soft Actor-Critic (SAC)

Created with Quartz, © 2025

Blog
LinkedIn
Twitter
GitHub