Robot Foundation Models

Octo: An Open-Source Generalist Robot Policy

Not really used, does not work well?

Here, unlike OpenVLA and RT-1, they actually have an action head, which takes the output embedding from the VLM, and does denoising.

How is this much different from Diffusion Policy?

It isn’t lol, diffusion policy has the same concept of taking the embeddings, and running denoising through it (but in diffusion policy, they just use a DiT, whereas in Octo, it’s pretrained).

Links:

Was really annoying for me to set up because I’m on mac.