Memory Wall
The memory wall is the widening gap between CPU speed and main memory speed: CPUs got faster much quicker than DRAM did, so on modern processors the bottleneck is no longer compute but waiting on memory.
Why?
From 1980–2005 CPU frequency roughly doubled every 2 years while DRAM speed doubled every ~6 years. The ratio compounded, so by the mid-2000s a single cache miss cost hundreds of compute cycles. Runtime used to be dominated by page faults (disk); now it’s dominated by cache misses (DRAM). Basically every modern CPU trick (multi-level caches, miss shadow, OoO, prefetching) exists to hide this gap.
From the Sun World Wide Analyst Conference 2003:
| Year | CPU freq scaling | DRAM scaling |
|---|---|---|
| doubling period | every 2 years | every 6 years |
Cost of a memory access today:
| Level | Latency |
|---|---|
| L1 | 2–3 cycles |
| L2 | tens of cycles |
| L3 | ~tens–100s |
| DRAM | 200–300 cycles |
Why DRAM stays slow. DRAM is 1 transistor + 1 capacitor per bit: cheap and dense but physically slow (capacitors need refreshes, rows need to be activated). SRAM is 6 transistors per bit: fast but uses ~6Ă— the area.
SRAM really uses 6x the area??
That’s why SRAM is used only for on-die caches. DDR improved bandwidth (two transfers per cycle), not latency.
Mitigations (all responses to the wall):
- Multi-level cache hierarchy (L1/L2/L3) with staged latency
- Miss shadow: useful work during a pending load
- OoO + renaming: overlap multiple misses
- Prefetching: pull likely-needed data into cache early
- SSDs/nonvolatile memory: the next layer down is also getting closer to DRAM speed
From ECE459 L06. One of the four walls alongside the power wall, ILP limits, and the speed of light.