Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Kevin mentions this was similar to pi0.