🛠️ Steven Gong

Search

SearchSearch
  • Multi-Query Attention (MQA)
  • Related

Mar 23, 2025, 1 min read

Attention

Multi-Query Attention (MQA)

Multi-head attention consists of multiple attention layers (heads) in parallel with different linear transformations on the queries, keys, values and outputs. Multi-query attention is identical except that the different heads share a single set of keys and values.

  • https://paperswithcode.com/method/multi-query-attention

Related

  • Grouped Query Attention

Graph View

Backlinks

  • Attention (Transformer)

Created with Quartz, © 2025

  • Blog
  • LinkedIn
  • Twitter
  • GitHub