🛠️ Steven Gong

Search

Advantage Actor Critic (A2C)
A3C

Feb 11, 2026, 1 min read

Policy Gradient Methods

Advantage Actor Critic (A2C)

https://huggingface.co/blog/deep-rl-a2c

A3C

From Lecture 3: Policy Gradient and Advantage Estimation from Deep RL Foundation Series, slides here

So they have two things
one is updating the value network $ϕ$ , one is updating the policy network $θ$

The update for $ϕ$ is called fitted Value Iteration

Graph View

Backlinks

Actor-Critic Methods
Off-Policy Methods

Created with Quartz, © 2026

Blog
LinkedIn
Twitter
GitHub