Miss Shadow
A miss shadow is the window between a load being issued and the loaded value being used, during which a modern CPU keeps executing unrelated instructions instead of stalling.
Why?
A load from DRAM takes 200â300 cycles. A naive CPU that stalls the instant it canât find the value in cache would throw away hundreds of cycles per miss, and in cache-miss-dominated code, thatâs most of the runtime. The miss shadow is how the CPU gets useful work done while waiting.
Cost of a load depends on where the value lives:
| Where the value lives | Cost |
|---|---|
| L1 cache | 2â3 cycles |
| L2 / L3 | in between |
| DRAM | 200â300 cycles |
Naive CPU: issue the load, stall until the value arrives in the destination register.
Modern CPU: keep issuing instructions. Hardware tracks âthis register isnât ready yetâ; only stall when an instruction actually reads it.
ld rax, [mem] ; MISS: value arrives in ~200 cycles
add rbx, 16 ; runs in the shadow (doesn't touch rax)
cmp rcx, 0 ; runs in the shadow
jeq label ; runs in the shadow (speculated)
...
mov rdx, rax ; â first USE of rax. Stalls if not ready.
Multiple loads in flight. The CPU isnât limited to one pending miss. Depending on the architecture, 2+ loads can be outstanding at once, so a second ld inside the shadow starts its own shadow, and the two miss latencies overlap instead of stacking. This is what OoO exploits to turn cache-miss-dominated code from âwait, wait, waitâ into âwait once for several misses at once.â
From ECE459 L06.