Paged Attention

Main contribution of vLLM.

Traditional LLM inference maintains a contiguous Key-Value (KV) cache in GPU memory, which is inefficient as the sequence length grows.