Wait-Free Programming
A data structure is wait-free if every thread performing an operation completes it in a bounded number of steps, regardless of what other threads do. It is strictly stronger than lock-free.
Why isn't lock-free enough?
Lock-free only guarantees system-wide progress, some thread always commits. An individual unlucky thread can lose every CAS race forever, that’s busy-wait starvation, and it breaks rule 5 (bounded waiting). Wait-free closes that gap: every thread, not just some thread, has a step bound.
Lock-free vs wait-free
| lock-free | wait-free | |
|---|---|---|
| per-op cost | constant atomic time (amortized) | constant atomic time (worst case) |
| progress | some thread completes | every thread completes |
| starvation | possible (rule 5 broken) | impossible (rule 5 holds) |
| ease of implementation | CAS retry loop | typically much harder |
A CAS retry loop is not wait-free because an unlucky thread’s retry count is unbounded. So the usual lock-free stack / queue / list are all lock-free but not wait-free.
What is wait-free, then?
Any operation that maps directly to a single hardware atomic and doesn’t loop:
FetchAddcounter, increment is one instruction, boundedSwapas a single-slot mailbox- TestSet as an atomic flag flip
// wait-free counter (fetch-and-increment)
int counter = 0;
void bump() { FetchInc( counter ); } // single atomic op, boundedGeneral-purpose wait-free queues and lists exist (Kogan-Petrank, Herlihy’s universal construction) but the constructions are heavy, announcement arrays, per-thread helping protocols, and are mostly academic. In practice “wait-free” in production code means “this op is a single atomic instruction.”
Why you almost never use wait-free
- Construction cost: general wait-free structures trade constant factors for the worst-case bound, throughput can be 10-100x worse than lock-free
- Narrow use cases: you only need wait-free when starvation is unacceptable (real-time systems, signal handlers, cross-boundary code where one thread must not delay another)
- The guarantee you usually want is lock-free: system-wide progress plus some ability to detect hot threads and back off
ECE459 framing
ECE459 L12 introduces wait-free via fetch_add/fetch_sub as bounded-step atomic ops:
std::atomic<int> ctr;
void inc() { ctr.fetch_add( 1, std::memory_order_seq_cst ); }
void dec() {
int old = ctr.fetch_sub( 1, std::memory_order_seq_cst );
if ( old == 1 ) std::cout << "all done\n";
}