Sample Efficiency

This is a core argument in On-Policy Methods vs Off-Policy Methods.

https://www.reddit.com/r/reinforcementlearning/comments/g4penl/lets_catch_em_all_what_are_reasons_for_the_poor/

By that measure, on-policy methods are actually more efficient per gradient step, but they’re unable to perform as many updates. Not to mention, they don’t recycle data.

🛠️ Steven Gong

Sample Efficiency

Graph View

Backlinks