Lock-Free Programming

Wait-Free Programming

A data structure is wait-free if every thread performing an operation completes it in a bounded number of steps, regardless of what other threads do. It is strictly stronger than lock-free.

Why isn't lock-free enough?

Lock-free only guarantees system-wide progress, some thread always commits. An individual unlucky thread can lose every CAS race forever, that’s busy-wait starvation, and it breaks rule 5 (bounded waiting). Wait-free closes that gap: every thread, not just some thread, has a step bound.

Lock-free vs wait-free

lock-freewait-free
per-op costconstant atomic time (amortized)constant atomic time (worst case)
progresssome thread completesevery thread completes
starvationpossible (rule 5 broken)impossible (rule 5 holds)
ease of implementationCAS retry looptypically much harder

A CAS retry loop is not wait-free because an unlucky thread’s retry count is unbounded. So the usual lock-free stack / queue / list are all lock-free but not wait-free.

What is wait-free, then?

Any operation that maps directly to a single hardware atomic and doesn’t loop:

  • FetchAdd counter, increment is one instruction, bounded
  • Swap as a single-slot mailbox
  • TestSet as an atomic flag flip
// wait-free counter (fetch-and-increment)
int counter = 0;
void bump() { FetchInc( counter ); }      // single atomic op, bounded

General-purpose wait-free queues and lists exist (Kogan-Petrank, Herlihy’s universal construction) but the constructions are heavy, announcement arrays, per-thread helping protocols, and are mostly academic. In practice “wait-free” in production code means “this op is a single atomic instruction.”

Why you almost never use wait-free

  • Construction cost: general wait-free structures trade constant factors for the worst-case bound, throughput can be 10-100x worse than lock-free
  • Narrow use cases: you only need wait-free when starvation is unacceptable (real-time systems, signal handlers, cross-boundary code where one thread must not delay another)
  • The guarantee you usually want is lock-free: system-wide progress plus some ability to detect hot threads and back off

ECE459 framing

ECE459 L12 introduces wait-free via fetch_add/fetch_sub as bounded-step atomic ops:

std::atomic<int> ctr;
void inc() { ctr.fetch_add( 1, std::memory_order_seq_cst ); }
void dec() {
    int old = ctr.fetch_sub( 1, std::memory_order_seq_cst );
    if ( old == 1 ) std::cout << "all done\n";
}