ABA Problem
The ABA problem is a false-positive CAS failure: a thread reads A, something else writes B then writes A back, and the thread’s later compare-and-swap succeeds as if nothing changed. Not an acronym, just the pattern A then B then A.
Covered in ECE459 L12 and in CS343.
Why does this break CAS?
CAS retries are supposed to fire whenever the location changed. An A→B→A round-trip silently defeats that, so the algorithm commits against state that was secretly replaced underneath it.
- This ultimately breaks Lock-Free Programming which heavily relies on CAS, so you need to be really careful
- CAS alone isn’t enough. You need CAS plus tagging, LL/SC, or SMR (more below)
When does this actually happen in prod?
Only when you use CAS on a recyclable value, almost always a pointer that the allocator hands back after a free. In practice that’s when you roll your own lock-free data structure with raw
compare_exchange. If you’re usingstd::mutex,crossbeam, channels, or any vetted lock-free crate, someone has already solved it for you.
- Non-GC languages (C / C++ / Rust) with hand-rolled Treiber stacks, Michael-Scott queues, freelists
- Aggressive allocators (jemalloc, tcmalloc) reuse freed addresses fast, so A→free→malloc→A is common
- GC’d languages (Java, Go) mostly dodge it: the GC won’t free a node while a thread still holds a reference
Background: https://www.baeldung.com/cs/aba-concurrency
Walkthrough: lock-free stack corruption
Concrete walkthrough from CS343 / Buhr. Two threads share a stack starting at . Both use this lock-free pop():
Node *Stack::pop() {
Node *t;
for ( ;; ) {
t = top; // read current top
if ( t == nullptr ) return nullptr;
if ( CAS( top, t, t->next ) ) return t; // swing top to t->next
}
}The idea: read top, stage top->next as the new top, then CAS. If top changed in between, CAS fails and we retry.
Step 1. Thread 1 starts a pop. It reads and reads , so it’s about to run .
Step 2. Thread 1 is preempted before the CAS fires.
Step 3. Thread 2 wakes up and runs to completion three operations on the same stack:
| op | resulting stack | note |
|---|---|---|
pop() | node is freed | |
pop() | node is freed | |
push(A) | reallocates at address , but now |
Step 4. Thread 1 resumes and runs . Since again, the CAS succeeds and swings top to :
But was freed in step 3. Its memory has been handed back to the allocator and may hold unrelated data. Stack corrupted.
Generic pattern
This is as a 4-step race between processes , , on a single memory location [DPS10]:
- reads from
- interrupts ; stores into
- stores back into
- resumes and executes a false-positive CAS, the location still reads , so the CAS succeeds even though the value went through in the meantime
and may be the same process. The pattern just needs one victim and at least one meddler that cycles the location through back to . In the stack walkthrough above, Thread 2 plays both and , with and a pointer to a node that gets freed and re-pushed. The generic pattern applies to any shared location, not just stack tops.
Fix 1: Tagging (CASVD / double-wide CAS)
The lecture’s term is tagging: attach a nonce (monotonically increasing counter) to the value on every write, and CAS both the value and the nonce atomically. This is done via double-wide CAS (CASVD), which compares a value+counter pair in one instruction.
Pack top and count into one 64/128-bit atomic. Every push increments count, so a re-pushed node carries a different tag and CASVD fails correctly.
Replaying the walkthrough with tagging:
- Thread 1 reads
- Thread 2 does pop, pop, push → counter bumps to , state is
- Thread 1 runs . Since , the CAS fails and Thread 1 retries
class Stack {
union Link {
struct { Node *top; uintptr_t count; }; // 2 × 32/64 bits
uintS_t atom; // 64/128-bit
} link;
public:
struct Node { /* data */ Link next; };
void push( Node &n ) {
n.next = link;
for ( ;; ) {
if ( CASVD( link.atom, n.next.atom,
(Link){ &n, n.next.count + 1 }.atom ) ) break;
}
}
Node *pop() {
Link t = link;
for ( ;; ) {
if ( t.top == nullptr ) return nullptr;
if ( CASVD( link.atom, t.atom,
(Link){ t.top->next.top, t.count }.atom ) ) return t.top;
}
}
};- counter is bumped in push only, which is enough, pop just needs to detect any intervening mutation, and every removal-then-readd sequence crosses at least one push
n.next = linkandLink t = linkdon’t need to be atomic, CASVD re-validates them
Probabilistic only
The counter is finite. After (or ) pushes the counter wraps back to the stored value; if Thread 1 wakes with that node still on top, ABA strikes again. Vanishingly unlikely with a 64-bit counter on realistic traffic.
Fix 2: LL/SC (hardware-tracked cell)
SC tracks the memory cell via a hardware reservation rather than comparing values. Any intervening write to the address invalidates the reservation even if the value lands back on A. No counter, no wrap, no probabilistic caveat.
- MIPS / ARM / PowerPC / RISC-V ship LL/SC
- x86 does not, it only has
LOCK CMPXCHG(CAS) and accepts the ABA cost with CASVD as the practical workaround
Fix 3: Hazard pointers / SMR
Doesn’t fix ABA directly, but removes its consequence. If popped nodes are only freed after every thread has proven it holds no reference, then even an ABA-confused CAS doesn’t dereference garbage. See Safe Memory Reclamation.
Fix 4: Application-level modification counters
Same idea as CASVD but at a higher level: keep a monotonically-increasing version next to the protected state, and readers snapshot it. Java’s ArrayList / HashMap iterators do this, if the mod count changed since the iterator was created, they throw ConcurrentModificationException rather than silently reading stale structure.
Which fix is used in practice?
Depends on the platform and how much you care about memory reclamation:
- Tagging (CASVD) is the go-to on x86, which lacks LL/SC.
cmpxchg16bgives 128-bit double-wide CAS. Java’sAtomicStampedReferenceand most WindowsInterlocked*APIs work this way - LL/SC is the native hardware answer on ARM, POWER, RISC-V. Rust/C++
std::atomic::compare_exchange_weaklowers to LL/SC on these targets, which is why_weakcan spuriously fail (the reservation gets invalidated) - Hazard pointers / SMR is what serious lock-free libraries use when the data structure involves dynamic memory: Facebook’s Folly, the proposed C++26 hazard-pointer API, Linux kernel RCU (a related idea for read-heavy structures). Rust’s crossbeam crate uses epoch-based reclamation, a cousin of hazard pointers
- App-level mod counters are used for correctness-flagging rather than lock-free data structures — Java collections throw on stale iterators instead of trying to be lock-free
Rule of thumb
For a lock-free container in production: hazard pointers or epoch-based reclamation for the free-safety problem, plus CASVD or LL/SC (whichever the target CPU supports) for the ABA detection problem. They solve different halves and are typically combined.