Safe Memory Reclamation (SMR)
Safe Memory Reclamation (SMR) solves the lifetime problem in lock-free data structures: a thread reads a pointer, is interrupted, another thread frees the node, the first thread resumes and dereferences a freed address ⇒ segfault. SMR determines when a node has no references and is safe to reclaim — the garbage-collection problem, but done by hand.
Why isn't free-after-CAS enough?
Because between copying
topand issuing CAS, a thread is vulnerable. Pop sequence:t = top; <interrupted> if ( CAS( top, t, t->next ) ) return t; // t->next: t may be freed!Another thread could pop
t,freeit, and the OS returns the page. When our thread resumes and readst->next, the address is invalid.
Hazard pointers — the canonical SMR
Each thread publishes a small array of hazard pointers: addresses it’s currently accessing.
- Thread wants to dereference
p⇒ writespto its hazard list. - Thread reads
p->.... - When done, clears the hazard slot.
- To retire a node
n, a thread hides it on a private list. Periodically (every ~K retires), the thread scans all threads’ hazard lists. Nodes absent from every hazard list are truly unreferenced and can befreed.
For the lock-free stack
Addresses x, y, z are nodes.
- First thread puts
xon its hazard list before dereferencing. - Second thread cannot reuse
x(it’s on a hazard list). - Second thread allocates a fresh node at a different address.
- First thread eventually notices the change (CAS fails), retries with the new pointer.
Cost and complexity
- Advanced lock-free structures (queue, deque, lock-free tree) need hundreds of lines of SMR glue.
- Scan cost: O(#threads × hazard_slots) per batch retire, amortized over K retires.
- Alternatives: epoch-based reclamation (Fraser), RCU (Linux kernel), reference counting (often simpler but with its own CAS overhead).
Lock-free vs locking — SMR is the honest tax
| locking | lock-free (with SMR) | |
|---|---|---|
| critical sections | arbitrary complexity | limited set |
| deadlock | possible (hold-and-wait) | impossible (no ownership) |
| progress under preemption | blocked owner stalls everyone | others make progress |
| memory reclamation | trivial (critical section owns lifetime) | needs SMR — hundreds of LOC |
| performance | comparable in general case | comparable in general case |
Lock-free without SMR breaks rule 5 (eventual progress) — a stalled memory-allocator or OOM from un-reclaimable nodes defeats the point.