Exotic Atomic Instruction

Beyond CAS and SC, some ISAs shipped higher-level atomic primitives that do entire data-structure operations in one uninterruptible step. The canonical examples are VAX INSQUE and REMQUE, atomic doubly-linked-list insert and remove, and IBM 370 CS/CDS (compare-and-swap / compare-double-and-swap).

Why bother shipping INSQUE in silicon?

Because a concurrent doubly-linked list with plain CAS is extremely hard (you’d need transactional memory or multi-step hazard-pointer protocols). The VAX designers noticed that OS kernels spent enormous time enqueueing/dequeueing PCBs on run queues and wait queues, so they put the operation directly in the CPU. It’s the lock-free dream without the lock-free gymnastics, at the cost of ISA complexity that modern RISC designs reject.

VAX INSQUE / REMQUE

INSQUE  entry, predecessor        ; atomically splice `entry` after `predecessor`
REMQUE  entry, destination_reg    ; atomically unlink `entry`, result to dest

Both run atomically across multiprocessor bus, an interrupted INSQUE leaves the list either fully updated or untouched, never partially linked.

Why these instructions died

  • CISC out, RISC in, modern ISAs (ARM, RISC-V) prefer small, composable atomics (LL/SC, atomic-fetch-op) and build higher-level primitives in software.
  • Cache-coherence protocols handle much of what INSQUE did, at lower silicon cost.
  • Transactional memory (Intel TSX, IBM z) generalizes the idea: arbitrary multi-word atomic regions in hardware.

HTM, SPECULATE / LOCK / COMMIT (Buhr §11.3)

Hardware transactional memory generalizes CAS to N reservations (typically 4/6/8, e.g. AMD64 Advanced Synchronization Facility). Models a CPU transaction like a database transaction: optimistically execute, commit or rollback.

SPECULATE                   ; start speculative region; clear zero flag
                            ; next instruction checks for abort, branches to retry
LOCK  MOV r, [addr]         ; reserved load — not yet visible to other CPUs
LOCK  MOV [addr2], r        ; reserved store — not yet visible
COMMIT                      ; end speculative region
                            ;   no conflict ⇒ publish all MOVs atomically
                            ;   conflict    ⇒ set fail flag, discard
                            ;                 reservations, restore regs,
                            ;                 branch to SPECULATE

Commit either makes all tagged LOCK MOVs visible together, or none, exactly the semantics you’d need to update both prev and next of a doubly-linked node atomically. Enables several lock-free structures without the ABA headache.

STM, atomic { ... } blocks (Buhr §11.3)

Software transactional memory drops the hardware reservation limit entirely: arbitrary-size atomic blocks, bookkept in software.

void push( header & h, node & n ) {
    atomic {                     // SPECULATE
        n.next = top;            // LOCK / MOV
        top = &n;
    }                            // COMMIT
}

Runtime records every memory location read and written inside atomic { ... }, plus the values mutated. On conflict with another transaction, it rolls back and restarts.

Drawbacks

  • Bookkeeping cost: tracking read/write sets + rollback usually degrades performance vs fine-grained locking.
  • Lock-insertion alternative: some STMs implement atomic by auto-inserting locks, but finding all accesses and ordering the acquisitions to avoid deadlock is hard.

Modern descendants

EraInstructionScope
VAXINSQUE/REMQUEatomic linked-list ops
IBM 370CS, CDSsingle- and double-word CAS
x86LOCK CMPXCHG, CMPXCHG16Bsingle/double CAS
ARM/PowerPCLL/SCreservation-based RMW
Intel TSXXBEGIN/XENDhardware transactional memory (multi-word)