🛠️ Steven Gong

Search

SearchSearch
  • Trust Region Policy Optimization (TRPO)
  • Related

Jul 21, 2025, 1 min read

Policy Gradient Methods

Trust Region Policy Optimization (TRPO)

TRPO updates policies by taking the largest step possible to improve performance, while satisfying a special constraint on how close the new and old policies are allowed to be.

This constraint is expressed in terms of the KL Divergence.

Resources

  • https://spinningup.openai.com/en/latest/algorithms/trpo.html

θk+1​=argmaxθ​ L(θk​,θ)s.t.DˉKL​(θ∣∣θk​)≤δ

Related

  • PPO

Graph View

Backlinks

  • Proximal Policy Optimization (PPO)

Created with Quartz, © 2025

  • Blog
  • LinkedIn
  • Twitter
  • GitHub