World Model
A world model is a learned (or designed) internal representation of the environment in which an agent (like a robot) operates.
The original world model idea:
Some papers:
- PWM
- Dreamer line of work
Links sent by jason:
- Video Prediction Policy https://arxiv.org/abs/2412.14803
- GAIA-2
- GR-2
- https://gen-irasim.github.io/
Faraz also told me some
Let’s reason from first principles. How do we build a world model?
It’s a essentially a model of the world. You can interact with it.
So you are essentially learning the dynamics: .
That’s pretty straightforward, but what about images? You want to learn physics and then how light updates.
in the original paper, they just use a VAE, to convert it into some latent vector.
Then,, they use a Mixture of Gaussians.
First, they have a vision encoder. Then, they have this RNN that predicts