Gaussian Policy
From the spinning up,
A simple gaussian policy is
- Outputs only one mode (i.e., it’s unimodal).
- Has limited flexibility: the shape of the distribution is fixed to be Gaussian.
- Often uses diagonal covariance, so it can’t capture dependencies between action dimensions.
This paper talks about this problem https://arxiv.org/pdf/2507.07986.
Like PPO learned a gaussian policy.
But really, we have more expressive policies (i.e. Diffusion Model and Flow Matching models).
The paper is contrasting this simple Gaussian policy class with expressive policies, like:
- Diffusion policies (which model action sequences as samples from a learned diffusion process)
- Flow-matching models (which can model complex, multimodal behaviors with learned transport maps)