CPU Performance Walls

Memory Wall

The memory wall is the widening gap between CPU speed and main memory speed: CPUs got faster much quicker than DRAM did, so on modern processors the bottleneck is no longer compute but waiting on memory.

Why?

From 1980–2005 CPU frequency roughly doubled every 2 years while DRAM speed doubled every ~6 years. The ratio compounded, so by the mid-2000s a single cache miss cost hundreds of compute cycles. Runtime used to be dominated by page faults (disk); now it’s dominated by cache misses (DRAM). Basically every modern CPU trick (multi-level caches, miss shadow, OoO, prefetching) exists to hide this gap.

From the Sun World Wide Analyst Conference 2003:

YearCPU freq scalingDRAM scaling
doubling periodevery 2 yearsevery 6 years

Cost of a memory access today:

LevelLatency
L12–3 cycles
L2tens of cycles
L3~tens–100s
DRAM200–300 cycles

Why DRAM stays slow. DRAM is 1 transistor + 1 capacitor per bit: cheap and dense but physically slow (capacitors need refreshes, rows need to be activated). SRAM is 6 transistors per bit: fast but uses ~6Ă— the area.

SRAM really uses 6x the area??

That’s why SRAM is used only for on-die caches. DDR improved bandwidth (two transfers per cycle), not latency.

Mitigations (all responses to the wall):

  • Multi-level cache hierarchy (L1/L2/L3) with staged latency
  • Miss shadow: useful work during a pending load
  • OoO + renaming: overlap multiple misses
  • Prefetching: pull likely-needed data into cache early
  • SSDs/nonvolatile memory: the next layer down is also getting closer to DRAM speed

From ECE459 L06. One of the four walls alongside the power wall, ILP limits, and the speed of light.