Learning Long-Context Diffusion Policies via Past-Token Prediction (PTP)

we use them as a ground-truth reference and select the candidate whose reconstructed past actions best match the executed ones:

In particular, they use past actions for the action consistency, as opposed to the current action.