🛠️ Steven Gong

Search

SearchSearch

Aug 24, 2025, 1 min read

Sample Efficiency

This is a core argument in On-Policy Methods vs Off-Policy Methods.

https://www.reddit.com/r/reinforcementlearning/comments/g4penl/lets_catch_em_all_what_are_reasons_for_the_poor/

By that measure, on-policy methods are actually more efficient per gradient step, but they’re unable to perform as many updates. Not to mention, they don’t recycle data.

Graph View

Backlinks

  • No backlinks found

Created with Quartz, © 2025

  • Blog
  • LinkedIn
  • Twitter
  • GitHub