Latency Hiding

Latency hiding is the general problem of overlapping compute & I/O: fully utilize the ALU and load/store unit in parallel so long-latency loads don’t waste issue slots. Framed in MIT 6.S894 Lec 5 (slides 11-14) as the design choice that splits CPUs from GPUs.

Two approaches from the same slide:

  • CPU-style: ILP, hoist loads early. Parallelism within one instruction stream. Mechanism in Miss Shadow
  • GPU-style: multithreading, switch streams. Parallelism across instruction streams. Mechanism in Warp Scheduling

Ragan-Kelley notes the GPU side is “more manual & explicit management of overlapping,” and that this split recurs at many levels of the memory hierarchy (async fetch to scratchpad, double-buffering, warp specialization).