World Model

A world model is a learned (or designed) internal representation of the environment in which an agent (like a robot) operates.

The original world model idea:

How are world models benchmarked?

Some papers:

Links sent by jason:

Video models

Faraz also told me some

Let’s reason from first principles. How do we build a world model?

It’s a essentially a model of the world. You can interact with it.

$x_{t} + u_{t} - > w or l d m o d e l - > x_{t + 1}$

So you are essentially learning the dynamics: $f (x, u)$ .

That’s pretty straightforward, but what about images? You want to learn physics and then how light updates.

in the original paper, they just use a VAE, to convert it into some latent vector.

First, they have a vision encoder. Then, they have this RNN that predicts $P (z_{t + 1} ∣ a_{t}, z_{t}, h_{t})$