ABA Problem

The ABA problem is a false-positive CAS failure: a thread reads A, something else writes B then writes A back, and the thread’s later compare-and-swap succeeds as if nothing changed. Not an acronym, just the pattern A then B then A.

Covered in ECE459 L12 and in CS343.

Why does this break CAS?

CAS retries are supposed to fire whenever the location changed. An A→B→A round-trip silently defeats that, so the algorithm commits against state that was secretly replaced underneath it.

This ultimately breaks Lock-Free Programming which heavily relies on CAS, so you need to be really careful
CAS alone isn’t enough. You need CAS plus tagging, LL/SC, or SMR (more below)

When does this actually happen in prod?

Only when you use CAS on a recyclable value, almost always a pointer that the allocator hands back after a free. In practice that’s when you roll your own lock-free data structure with raw compare_exchange. If you’re using std::mutex, crossbeam, channels, or any vetted lock-free crate, someone has already solved it for you.

Non-GC languages (C / C++ / Rust) with hand-rolled Treiber stacks, Michael-Scott queues, freelists

Aggressive allocators (jemalloc, tcmalloc) reuse freed addresses fast, so A→free→malloc→A is common

GC’d languages (Java, Go) mostly dodge it: the GC won’t free a node while a thread still holds a reference

Background: https://www.baeldung.com/cs/aba-concurrency

Walkthrough: lock-free stack corruption

Concrete walkthrough from CS343 / Buhr. Two threads share a stack starting at $top \to A \to B \to C$ . Both use this lock-free pop():

Node *Stack::pop() {
    Node *t;
    for ( ;; ) {
        t = top;                               // read current top
        if ( t == nullptr ) return nullptr;
        if ( CAS( top, t, t->next ) ) return t; // swing top to t->next
    }
}

The idea: read top, stage top->next as the new top, then CAS. If top changed in between, CAS fails and we retry.

Step 1. Thread 1 starts a pop. It reads $t = top = A$ and reads $t \to next = B$ , so it’s about to run $CAS (top, A, B)$ .

Step 2. Thread 1 is preempted before the CAS fires.

Step 3. Thread 2 wakes up and runs to completion three operations on the same stack:

op	resulting stack	note
`pop()`	$top \to B \to C$	node $A$ is freed
`pop()`	$top \to C$	node $B$ is freed
`push(A)`	$top \to A \to C$	reallocates at address $A$ , but $A \to next = C$ now

Step 4. Thread 1 resumes and runs $CAS (top, A, B)$ . Since $top == A$ again, the CAS succeeds and swings top to $B$ :

$top \to B \to ? ? ?$

But $B$ was freed in step 3. Its memory has been handed back to the allocator and may hold unrelated data. Stack corrupted.

Generic pattern

This is as a 4-step race between processes $P_{1}$ , $P_{k}$ , $P_{j}$ on a single memory location $L_{i}$ [DPS10]:

$P_{1}$ reads $A_{i}$ from $L_{i}$

$P_{k}$ interrupts $P_{1}$ ; $P_{k}$ stores $B$ into $L_{i}$

$P_{j}$ stores $A_{i}$ back into $L_{i}$

$P_{1}$ resumes and executes a false-positive CAS, the location still reads $A_{i}$ , so the CAS succeeds even though the value went through $B$ in the meantime

$P_{k}$ and $P_{j}$ may be the same process. The pattern just needs one victim and at least one meddler that cycles the location through $B$ back to $A_{i}$ . In the stack walkthrough above, Thread 2 plays both $P_{k}$ and $P_{j}$ , with $L_{i} = top$ and $A_{i}$ a pointer to a node that gets freed and re-pushed. The generic pattern applies to any shared location, not just stack tops.

Fix 1: Tagging (CASVD / double-wide CAS)

The lecture’s term is tagging: attach a nonce (monotonically increasing counter) to the value on every write, and CAS both the value and the nonce atomically. This is done via double-wide CAS (CASVD), which compares a value+counter pair in one instruction.

Pack top and count into one 64/128-bit atomic. Every push increments count, so a re-pushed node carries a different tag and CASVD fails correctly.

Replaying the walkthrough with tagging:

Thread 1 reads $(top, count) = (A, 3)$
Thread 2 does pop, pop, push → counter bumps to $4$ , state is $(A, 4)$
Thread 1 runs $CASVD ((A, 3), new)$ . Since $(A, 3) \neq = (A, 4)$ , the CAS fails and Thread 1 retries

class Stack {
    union Link {
        struct { Node *top; uintptr_t count; };    // 2 × 32/64 bits
        uintS_t atom;                              // 64/128-bit
    } link;
public:
    struct Node { /* data */ Link next; };
    void push( Node &n ) {
        n.next = link;
        for ( ;; ) {
            if ( CASVD( link.atom, n.next.atom,
                        (Link){ &n, n.next.count + 1 }.atom ) ) break;
        }
    }
    Node *pop() {
        Link t = link;
        for ( ;; ) {
            if ( t.top == nullptr ) return nullptr;
            if ( CASVD( link.atom, t.atom,
                        (Link){ t.top->next.top, t.count }.atom ) ) return t.top;
        }
    }
};

counter is bumped in push only, which is enough, pop just needs to detect any intervening mutation, and every removal-then-readd sequence crosses at least one push
n.next = link and Link t = link don’t need to be atomic, CASVD re-validates them

Probabilistic only

The counter is finite. After $2^{32}$ (or $2^{64}$ ) pushes the counter wraps back to the stored value; if Thread 1 wakes with that node still on top, ABA strikes again. Vanishingly unlikely with a 64-bit counter on realistic traffic.

Fix 2: LL/SC (hardware-tracked cell)

SC tracks the memory cell via a hardware reservation rather than comparing values. Any intervening write to the address invalidates the reservation even if the value lands back on A. No counter, no wrap, no probabilistic caveat.

MIPS / ARM / PowerPC / RISC-V ship LL/SC
x86 does not, it only has LOCK CMPXCHG (CAS) and accepts the ABA cost with CASVD as the practical workaround

Fix 3: Hazard pointers / SMR

Doesn’t fix ABA directly, but removes its consequence. If popped nodes are only freed after every thread has proven it holds no reference, then even an ABA-confused CAS doesn’t dereference garbage. See Safe Memory Reclamation.

Fix 4: Application-level modification counters

Same idea as CASVD but at a higher level: keep a monotonically-increasing version next to the protected state, and readers snapshot it. Java’s ArrayList / HashMap iterators do this, if the mod count changed since the iterator was created, they throw ConcurrentModificationException rather than silently reading stale structure.

Which fix is used in practice?

Depends on the platform and how much you care about memory reclamation:

Tagging (CASVD) is the go-to on x86, which lacks LL/SC. cmpxchg16b gives 128-bit double-wide CAS. Java’s AtomicStampedReference and most Windows Interlocked* APIs work this way
LL/SC is the native hardware answer on ARM, POWER, RISC-V. Rust/C++ std::atomic::compare_exchange_weak lowers to LL/SC on these targets, which is why _weak can spuriously fail (the reservation gets invalidated)
Hazard pointers / SMR is what serious lock-free libraries use when the data structure involves dynamic memory: Facebook’s Folly, the proposed C++26 hazard-pointer API, Linux kernel RCU (a related idea for read-heavy structures). Rust’s crossbeam crate uses epoch-based reclamation, a cousin of hazard pointers
App-level mod counters are used for correctness-flagging rather than lock-free data structures — Java collections throw on stale iterators instead of trying to be lock-free

Rule of thumb

For a lock-free container in production: hazard pointers or epoch-based reclamation for the free-safety problem, plus CASVD or LL/SC (whichever the target CPU supports) for the ABA detection problem. They solve different halves and are typically combined.

🛠️ Steven Gong

Table of Contents

ABA Problem

Walkthrough: lock-free stack corruption

Fix 1: Tagging (CASVD / double-wide CAS)

Fix 2: LL/SC (hardware-tracked cell)

Fix 3: Hazard pointers / SMR

Fix 4: Application-level modification counters

Which fix is used in practice?

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

ABA Problem

Walkthrough: lock-free stack corruption

Fix 1: Tagging (CASVD / double-wide CAS)

Fix 2: LL/SC (hardware-tracked cell)

Fix 3: Hazard pointers / SMR

Fix 4: Application-level modification counters

Which fix is used in practice?

Related

Graph View

Backlinks