Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
A diffusion model is trained to denoise a set of tokens with independent per-token noise levels.
A diffusion model is trained to denoise a set of tokens with independent per-token noise levels.