Sample Efficiency
This is a core argument in On-Policy Methods vs Off-Policy Methods.
By that measure, on-policy methods are actually more efficient per gradient step, but they’re unable to perform as many updates. Not to mention, they don’t recycle data.