Preference Optimization as Probabilistic Inference

Mentioned in DreamerV4 paper.