🛠️ Steven Gong

Search

SearchSearch

Mar 25, 2025, 1 min read

Attention

Multi-Head Latent Attention (MLA)

Heard about this from a Lex friedman podcast https://www.youtube.com/watch?v=PncVSWbxdWU

Damnnn DeepSeek was behind MLA!

https://www.youtube.com/watch?v=0VLAoVGf_74

Check out this guy https://fxmeng.github.io/#two

Graph View

Backlinks

  • Attention (Transformer)

Created with Quartz, © 2025

  • Blog
  • LinkedIn
  • Twitter
  • GitHub