pi0

Links:

https://www.physicalintelligence.company/blog/pi0
https://www.physicalintelligence.company/download/pi0.pdf

The images and proprioceptive state are encoded via corresponding encoders and then projected via a linear projection layer into the same embedding space as the language tokens.

Model Architecture

siglip
PaliGemma

” averaging over 10 trials per task”

This is how many trials they do to get success rate

Why flow-matching?

To ensure /constrain smooth robot outputs as opposed to random jumps in values

🛠️ Steven Gong

pi0

Graph View

Backlinks