Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Kevin mentions this was similar to pi0.