Scalable Diffusion Models with Transformers (DiT)

Feel like I should really read this to understand diffusion transformers.