Concurrency

Lock-Free Programming

A lock-free data structure has operations that are critical sections, but performed without ownership, every operation completes in constant atomic time (Buhr §11.1). No thread holds a lock; mutation happens via a single atomic read-modify-write (typically CAS) in a retry loop.

Why call it lock-free if you're still spinning?

Because there’s no owned lock, no thread can hold one, die, and leave others stranded. But Buhr is explicit that lock-free is still locking (misnomer): you’re spinning on a conceptual lock implemented via CAS, which is still busy-waiting. The “free” means no ownership, not no waiting.

The distinguishing properties come directly from that no-ownership shape:

  • no deadlock possible, hold-and-wait can’t happen without ownership
  • no underutilization on preemption, an interrupted thread doesn’t block others, others keep finishing their operations
  • limited set of critical sections, the whole mutation has to fit in one (possibly double-wide) atomic op. Stack, single-producer-single-consumer queue, counters: yes. Arbitrary tree rotation: no
  • no progress guarantee per thread, an unlucky thread can lose every CAS race forever. If you need that guarantee, go to wait-free
  • breaks rule 5 without SMR, freed nodes + dereference races eventually cause OOM or segfault, violating bounded waiting for the memory allocator

Core pattern: CAS retry loop

“If nobody changed this out from under me, commit; else reread and try again.” Canonical lock-free stack push from Buhr §11.1.2:

void Stack::push( Node &n ) {
    for ( ;; ) {                          // busy wait
        n.next = top;                     // link to current top
        if ( CAS( top, n.next, &n ) ) break;
    }
}

The three CAS variants from §11.1 differ in what they do on failure:

  • CAS(val, comp, nval), returns bool, retry has to re-read
  • CASV(val, &comp, nval), on failure writes current value into comp, saves the n.next = top reload
  • CASVD(...), double-wide, lets you pack (pointer, counter) and defeat ABA

See Lock-Free Stack for the full push/pop walkthrough.

Lock vs lock-free (Buhr §11.2 summary)

locklock-free
critical section scopearbitraryfits in one atomic op
deadlockpossibleimpossible
interrupted holderstalls everyoneothers progress
memory reclamationtrivialneeds SMR
performancecomparable in generalcomparable in general

The “no performance difference in general” is Buhr’s point: reach for lock-free because you need the progress properties, not because you think it’s faster.

Lock-free ≠ wait-free

  • Lock-free: some thread always makes progress, individuals can starve
  • Wait-free: every thread finishes in a bounded number of steps

The CAS retry loop above is lock-free but not wait-free.

Data structures worth knowing

  • Lock-Free Stack, canonical example, push/pop via CAS on top
  • Lock-free queue (Michael-Scott), SPSC queue, needs more care
  • Lock-free counter, just fetch_add (also wait-free)

ECE459 framing

ECE459 L12 frames lock-free as: “if any thread suspended mid-operation still lets others finish theirs.” Also notes restrictions are allowed, a queue permitting one appender and one remover but not two removers is still lock-free.

Lock-free is not about speed

Lock-free guarantees forward progress, not throughput. Sometimes CAS is faster than lock/unlock; sometimes the algorithm ends up slower. Reach for it when you must (signal handlers, thread-death tolerance, progress guarantees), not because it’s “fast.”

Resources: