Policy Gradient Methods

Proximal Policy Optimization (PPO)

Resources

Other (Explained well from the spiderman video)

Good reasons for using PPO:

  • its comparatively high data efficiency
  • its ability to cope with various kinds of action spaces
  • its robust learning performance

There are 2 networks being trained:

  1. Value Network
  2. Policy Network

This kind of sounds like the 2 different networks used in GANs.