Categorical Policy
Learned this from spinning up https://spinningup.openai.com/en/latest/spinningup/rl_intro.html.
A categorical policy is like a classifier over discrete actions.
- It’s essentially the same ideas that used to train a next-token autoregressive model in the LLM world