Proximal Policy Optimization (PPO)
Resources
- Lecture 4: TRPO, PPO from Deep RL Foundations, slides here
- https://lilianweng.github.io/posts/2018-04-08-policy-gradient/
- https://openai.com/blog/openai-baselines-ppo/
- https://arxiv.org/pdf/1707.06347.pdf
Other (Explained well from the spiderman video)
- https://huggingface.co/blog/deep-rl-ppo#the-clipped-part-of-the-clipped-surrogate-objective-function
- Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization (explanation paper)
- https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/
Good reasons for using PPO:
- its comparatively high data efficiency
- its ability to cope with various kinds of action spaces
- its robust learning performance
There are 2 networks being trained:
- Value Network
- Policy Network
This kind of sounds like the 2 different networks used in GANs.