Gaussian Policy

From the spinning up,

A simple gaussian policy is

  • Outputs only one mode (i.e., it’s unimodal).
  • Has limited flexibility: the shape of the distribution is fixed to be Gaussian.
  • Often uses diagonal covariance, so it can’t capture dependencies between action dimensions.

This paper talks about this problem https://arxiv.org/pdf/2507.07986.

Like PPO learned a gaussian policy.

But really, we have more expressive policies (i.e. Diffusion Model and Flow Matching models).

The paper is contrasting this simple Gaussian policy class with expressive policies, like:

  • Diffusion policies (which model action sequences as samples from a learned diffusion process)
  • Flow-matching models (which can model complex, multimodal behaviors with learned transport maps)