Advantage Actor Critic (A2C)
https://huggingface.co/blog/deep-rl-a2c
A3C
From Lecture 3: Policy Gradient and Advantage Estimation from Deep RL Foundation Series, slides here
- So they have two things
- one is updating the value network , one is updating the policy network
The update for is called fitted Value Iteration