Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets

By toyota research institute.

This is a first look thinking about how we unify world models and VLAs.