Multi-Head Latent Attention (MLA)
Heard about this from a Lex friedman podcast https://www.youtube.com/watch?v=PncVSWbxdWU
Damnnn DeepSeek was behind MLA!
Check out this guy https://fxmeng.github.io/#two
Heard about this from a Lex friedman podcast https://www.youtube.com/watch?v=PncVSWbxdWU
Damnnn DeepSeek was behind MLA!
Check out this guy https://fxmeng.github.io/#two