Sequential Optimization
From L14. Sequential optimizations are transformations a compiler or CPU applies assuming a single thread of execution. They preserve the meaning of the program for that thread in isolation, but can silently break correctness the moment another thread observes the same memory. Every concurrent-memory bug in this course traces back to one of these.
Why does "single-threaded correct" ever become "multi-threaded wrong"?
Because the optimizer’s definition of “correct” is the as-if rule: the observable behaviour of one thread (return values, I/O) matches the source program. Internal memory order is free to change. That freedom is fine when no one else reads your memory, but the moment a second thread does, its observations are the optimizer’s blind spot. Your
while (!flag)works in isolation; optimized,flagis cached in a register and the loop spins forever.
The three families
- Eliding, delete operations whose result the thread “doesn’t need”: redundant loads/stores, cached register copies, provably unread writes. Kills spin-wait on unsynchronized flags.
- Disjoint Reordering, swap independent instructions (W→R, R→W, W→W, R→R) to fill pipeline slots or overlap latency. Breaks Dekker/Peterson, producer-consumer, double-check locking.
- Replication / caching, keep values in registers or per-core caches; see Replication. Helpful by default, harmful when readers need fresh shared state.
Why the compiler is so aggressive
- Register allocation, a loop variable reloaded every iteration is much slower than one kept in a register. The compiler caches unless told otherwise (
volatile, atomics, fences). - Instruction scheduling, modern pipelines need independent instructions to execute in parallel; reordering exposes that independence.
- Dead-code elimination, proving a store is unread lets the compiler drop it and the computation that produced it.
The defence
- Declare shared variables as atomics (C++11
std::atomic, Javavolatile), forces the compiler to emit real loads/stores and honours ordering. - Insert memory barriers at synchronization points.
- Trust well-engineered locks, they already contain the right fences.
- See Memory Consistency Model for the overall contract.