IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

First saw this when I was playing around with OGBench (but i think it was just the impala encoder?). Am realizing that it has much more implications? Cited by Learning to Drive from a World Model for example.

Multiple actors act in parallel and generate experiences that are sent to a central learner.