🛠️ Steven Gong

Search

SearchSearch
  • Advantage Actor Critic (A2C)
  • A3C

Jun 23, 2023, 1 min read

Policy Gradient Methods

Advantage Actor Critic (A2C)

https://huggingface.co/blog/deep-rl-a2c

A3C

From Lecture 3: Policy Gradient and Advantage Estimation from Deep RL Foundation Series, slides here

  • So they have two things
  • one is updating the value network ϕ, one is updating the policy network θ

The update for ϕ is called fitted Value Iteration

Graph View

Backlinks

  • Off-Policy Methods
  • Policy Gradient Methods

Created with Quartz, © 2025

  • Blog
  • LinkedIn
  • Twitter
  • GitHub