🛠️ Steven Gong

Search

May 24, 2026, 1 min read

Mixture-of-Experts (MoE)

I’m lacking good mental model for how this works.

I think the stanford CS336 lecture on MoE is probably good.

Some interesting ideas that I need to wrap my head around:

Fine-grained experts vs coarse experts
- if you’re restricted to the same number of parameters, perhaps it is better to further split them up, rather than have a larger parameter FFN
  - because you are adding those numbers at the end, so maybe there’s a way to represent richer information?
- Relation to more # of shared experts?

Resources

https://huggingface.co/blog/moe

Graph View

Backlinks

π_0 - A Vision-Language-Action Flow Model for General Robot Control (pi0)

Created with Quartz, © 2026

Blog
LinkedIn
Twitter
GitHub