Compare-and-Swap Instruction

Compare-and-swap atomically reads a memory location, compares it to an expected value, and writes a new value if they match, all in one uninterruptible step. Also called compare-and-exchange (the cmpxchg instruction on x86). Covered in ECE459 L12.

CAS returns bool. If val == comp, write nval; else nothing:

bool CAS( int &val, int comp, int nval ) {
    // atomic
    if ( val == comp ) { val = nval; return true; }
    return false;
}
// spinlock:
while ( ! CAS( Lock, OPEN, CLOSED ) );

CASV (compare-and-set-value): on failure, copies the current value of val back into comp by reference. Saves a reload in retry loops. In Stack::push, CAS failure needs n.next = top; at the top of each iteration to refresh, whereas CASV writes the fresh top into n.next for free:

bool CASV( int &val, int &comp, int nval ) {
    if ( val == comp ) { val = nval; return true; }
    comp = val;
    return false;
}

CASVD (compare-and-set-value-double-wide): operates on a 64/128-bit atom for a 32/64-bit machine. Lets you pack (pointer, counter) into a single atomic operation, which solves ABA probabilistically for lock-free structures:

bool CASVD( uintS_t &val, uintS_t &comp, uintS_t nval ) {
    if ( val == comp ) { val = nval; return true; }
    comp = val;
    return false;
}

See Lock-Free Stack for CASVD use on (top, count).

Lock-Free Programming

Why is CAS the workhorse of lock-free code?

It packs read, compare, and write into one atomic instruction. That’s enough to build a retry-on-conflict loop without any explicit locking.

The canonical CAS loop: read the current value, compute the new value, CAS; on conflict reread and retry.

loop {
    let old = atomic.load(Ordering::Acquire);
    let new = f(old);
    if atomic.compare_exchange(old, new, Ordering::AcqRel, Ordering::Acquire).is_ok() {
        break;
    }
}

Only one concurrent CAS on a given location succeeds, so the system as a whole always makes progress even if an individual thread has to spin.

Where CAS shows up:

  • Head and tail pointer updates in lock-free stacks and queues (see Lock-Free Stack)
  • Flipping a lock flag from free to held (the spinlock example below)
  • Installing a singleton pointer exactly once for lazy initialization
  • Atomic counters when fetch_add isn’t expressive enough, e.g. saturating or bounded increments
  • Work-stealing deques in concurrent schedulers

Rust: compare_and_swap(expected, new, Ordering) on atomic types. Also swap (no comparison, just returns old value), and the more advanced compare_exchange and compare_exchange_weak.

Spinlock via CAS

use std::sync::atomic::{AtomicBool, Ordering, spin_loop_hint};
let my_lock = AtomicBool::new(false);
while my_lock.compare_and_swap(false, true, Ordering::SeqCst) == true {
    spin_loop_hint();
}
// critical section
my_lock.store(false, Ordering::SeqCst);

spin_loop_hint() tells the CPU it’s fine to yield a hyperthread or enter low-power mode while spinning. Spinlocks only make sense when the expected wait is shorter than two context switches.

ABA false positives

CAS can succeed even when the value round-tripped A→B→A while you weren’t looking. See ABA Problem.