Learning Long-Context Diffusion Policies via Past-Token Prediction (PTP)
we use them as a ground-truth reference and select the candidate whose reconstructed past actions best match the executed ones:
In particular, they use past actions for the action consistency, as opposed to the current action.