Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

A diffusion model is trained to denoise a set of tokens with independent per-token noise levels.