Memory Reordering

Disjoint Reordering

Disjoint reordering is the compiler/hardware optimization of swapping two memory operations on different addresses (disjoint variables). Sequentially it’s invisible β€” same result. Concurrently it breaks every software mutex algorithm that communicates via shared flags (Dekker, Peterson, synchronization flags), because the reorder makes another thread see the writes in the wrong order.

Why can hardware reorder at all if locks are supposed to work?

Because hardware optimizes for the sequential perspective: pipelines, out-of-order execution, and store buffers all assume reordering disjoint operations is free. Locks β€œwork” when built on atomic instructions with fencing (test-and-set, CAS) β€” those are the hardware-provided primitives where the CPU agrees not to reorder. Plain loads/stores give no such guarantee.

How the four reorderings break things

For independent addresses x and y:

ReorderBreaks
R_x β†’ R_y allows R_y β†’ R_xHarmless β€” two reads, no observable effect.
W_x β†’ R_y allows R_y β†’ W_x[[notes/Dekker Algorithm
R_x β†’ W_y allows W_y β†’ R_xProducer-consumer: data = Data; Insert = true; reordered to Insert = true; data = Data; β‡’ consumer reads uninitialized data.
W_x β†’ W_y allows W_y β†’ W_x[[notes/Peterson Algorithm

Broken double-checked locking

int *ip = nullptr;                    // shared
if ( ip == nullptr ) {                // first check β€” no race
    lock.acquire();
    if ( ip == nullptr ) {            // second check β€” got lock, still null?
        ip = new int(0);              // allocate + initialize
    }
    lock.release();
}
// bug: `new int(0)` compiles to:
//   call malloc β†’ r1
//   st 0, (r1)           // initialize storage
//   st r1, ip            // initialize pointer
// hardware may reorder the two stores β‡’ another thread sees `ip` set
// before storage is initialized β‡’ reads uninitialized memory.

Fixes

  • Memory fences β€” __asm__ __volatile__( "mfence" ) on x86; __sync_synchronize(), C++11 std::atomic_thread_fence.
  • Atomic operations with ordering β€” C++11 std::atomic<T> with appropriate memory_order defaults to SC and inserts fences automatically.
  • Language-level volatile β€” Java volatile / C# volatile imply SC on that variable.