Paged Attention
Main contribution of vLLM.
Traditional LLM inference maintains a contiguous Key-Value (KV) cache in GPU memory, which is inefficient as the sequence length grows.
Main contribution of vLLM.
Traditional LLM inference maintains a contiguous Key-Value (KV) cache in GPU memory, which is inefficient as the sequence length grows.