Memory Consistency Model

A memory consistency model is the contract that specifies which orderings of memory operations are observable across threads. Covered in CS343 §10 and ECE459 L15.

Why do we need one?

Without a model, “event A happens before event B” has no meaning across threads. The model names the guarantees so locks, atomics, and fences can be reasoned about.

Relaxation models

Hardware models differ in which reorderings they allow between disjoint loads and stores, and in whether cache updates are lazy:

Model	W→R	R→W	W→W	Lazy cache	Real hardware
Atomic Consistent (AT)					slow/impossible (distributed)
Sequential Consistency (SC)				yes	none natively; strongest useful
Total Store Order (TSO)	yes			yes	x86, SPARC
Partial Store Order (PSO)	yes	yes		yes
Weak Order (WO)	yes	yes	yes	yes	ARM, Alpha
Release Consistency (RC)	yes	yes	yes	yes	PowerPC + explicit atomic R/W syncs

AT: events occur instantaneously, impossible on real distributed hardware
SC: events are not instantaneous, so reads may be stale, but Dekker/Peterson still work
TSO (x86): only write-then-read can be reordered (store buffer). Most SC software still works with fences at critical W→R boundaries
WO (ARM): all four reorderings allowed, so software mutex requires fences
RC: explicit acquire/release points, hardware is free between them

Key principle

No user races + strong locks ⇒ SC semantics. Build locks with hardware atomics plus fences, protect all shared data with locks, and programmer-visible behaviour collapses back to SC even on a WO machine.

Where reorderings come from

Compiler: moves unrelated instructions to fill load-delay slots, and may do anything under undefined behaviour
Hardware: the CPU runs instructions in whatever order it finds convenient
Cross-thread visibility: thread A’s read can be reordered before thread B’s write becomes visible, so A sees stale data

Programming against it

Rust adopts C++‘s model (see std::memory_order and std::atomic). The orderings are RC-style: you pick which reorderings are blocked at each atomic op, trading safety for speed.

Relaxed (`memory_order_relaxed`)

Atomicity only, no ordering. The op itself is indivisible (no torn reads/writes), but the compiler and CPU can move unrelated accesses freely across it. No happens-before is established with other threads. Use it for independent counters:

hits.fetch_add(1, memory_order_relaxed);  // metrics counter, nobody sequences on this

Acquire (`memory_order_acquire`)

Applies to loads. Forbids later reads/writes from being reordered before this load. Think of it as “after I’ve loaded this value, everything below must stay below.”

Release (`memory_order_release`)

Applies to stores. Forbids earlier reads/writes from being reordered after this store. Think of it as “before I publish this value, everything above must actually be done.”

Acquire + Release compose into the publish/subscribe pattern that makes most lock-free code work:

data = 123;                           // (1) above the release
flag.store(1, memory_order_release);  // (2) nothing above can slip below
// ... another thread ...
while (flag.load(memory_order_acquire) == 0) {}  // (3) nothing below can slip above
assert(data == 123);                  // (4) guaranteed to see (1)

AcqRel (`memory_order_acq_rel`)

Both acquire and release, for read-modify-write ops like fetch_add that logically do a load then a store.

SeqCst (`memory_order_seq_cst`, default)

Strongest. Acquire + Release plus a single total order across all SeqCst ops on all threads. Needed when you have multiple independent flags and the code relies on their relative ordering. Expensive on weak-memory hardware (ARM inserts full fences).

Rule of thumb

Use SeqCst until profiling proves otherwise. Drop to Acquire/Release for publish/subscribe patterns. Drop to Relaxed only for counters nobody sequences on.

Real bug (Crossbeam PR #98)

A lock-free queue reported garbage in registers. The fix: the load of ready needed at least Acquire, and the store needed Release. Without them, the thread parked too early [O’C18].

Fences in practice

sfence / lfence / mfence: x86 store/load/full fences
Java volatile and C++11 std::atomic with default memory_order_seq_cst insert fences automatically
__asm__ __volatile__("" ::: "memory"): compiler barrier only, no hardware fence

🛠️ Steven Gong

Table of Contents

Memory Consistency Model

Relaxation models

Where reorderings come from

Programming against it

Relaxed (`memory_order_relaxed`)

Acquire (`memory_order_acquire`)

Release (`memory_order_release`)

AcqRel (`memory_order_acq_rel`)

SeqCst (`memory_order_seq_cst`, default)

Fences in practice

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

Memory Consistency Model

Relaxation models

Where reorderings come from

Programming against it

Relaxed (memory_order_relaxed)

Acquire (memory_order_acquire)

Release (memory_order_release)

AcqRel (memory_order_acq_rel)

SeqCst (memory_order_seq_cst, default)

Fences in practice

Related

Graph View

Backlinks

Relaxed (`memory_order_relaxed`)

Acquire (`memory_order_acquire`)

Release (`memory_order_release`)

AcqRel (`memory_order_acq_rel`)

SeqCst (`memory_order_seq_cst`, default)