Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
A diffusion model is trained to denoise a set of tokens with independent per-token noise levels.
Diffusion forcing = Teacher Forcing + Diffusion Model
A diffusion model is trained to denoise a set of tokens with independent per-token noise levels.
Diffusion forcing = Teacher Forcing + Diffusion Model