Exotic Atomic Instruction
Beyond CAS and SC, some ISAs shipped higher-level atomic primitives that do entire data-structure operations in one uninterruptible step. The canonical examples are VAX INSQUE and REMQUE, atomic doubly-linked-list insert and remove, and IBM 370 CS/CDS (compare-and-swap / compare-double-and-swap).
Why bother shipping
INSQUEin silicon?Because a concurrent doubly-linked list with plain CAS is extremely hard (you’d need transactional memory or multi-step hazard-pointer protocols). The VAX designers noticed that OS kernels spent enormous time enqueueing/dequeueing PCBs on run queues and wait queues, so they put the operation directly in the CPU. It’s the lock-free dream without the lock-free gymnastics, at the cost of ISA complexity that modern RISC designs reject.
VAX INSQUE / REMQUE
INSQUE entry, predecessor ; atomically splice `entry` after `predecessor`
REMQUE entry, destination_reg ; atomically unlink `entry`, result to dest
Both run atomically across multiprocessor bus, an interrupted INSQUE leaves the list either fully updated or untouched, never partially linked.
Why these instructions died
- CISC out, RISC in, modern ISAs (ARM, RISC-V) prefer small, composable atomics (LL/SC, atomic-fetch-op) and build higher-level primitives in software.
- Cache-coherence protocols handle much of what
INSQUEdid, at lower silicon cost. - Transactional memory (Intel TSX, IBM z) generalizes the idea: arbitrary multi-word atomic regions in hardware.
HTM, SPECULATE / LOCK / COMMIT (Buhr §11.3)
Hardware transactional memory generalizes CAS to N reservations (typically 4/6/8, e.g. AMD64 Advanced Synchronization Facility). Models a CPU transaction like a database transaction: optimistically execute, commit or rollback.
SPECULATE ; start speculative region; clear zero flag
; next instruction checks for abort, branches to retry
LOCK MOV r, [addr] ; reserved load — not yet visible to other CPUs
LOCK MOV [addr2], r ; reserved store — not yet visible
COMMIT ; end speculative region
; no conflict ⇒ publish all MOVs atomically
; conflict ⇒ set fail flag, discard
; reservations, restore regs,
; branch to SPECULATE
Commit either makes all tagged LOCK MOVs visible together, or none, exactly the semantics you’d need to update both prev and next of a doubly-linked node atomically. Enables several lock-free structures without the ABA headache.
STM, atomic { ... } blocks (Buhr §11.3)
Software transactional memory drops the hardware reservation limit entirely: arbitrary-size atomic blocks, bookkept in software.
void push( header & h, node & n ) {
atomic { // SPECULATE
n.next = top; // LOCK / MOV
top = &n;
} // COMMIT
}Runtime records every memory location read and written inside atomic { ... }, plus the values mutated. On conflict with another transaction, it rolls back and restarts.
Drawbacks
- Bookkeeping cost: tracking read/write sets + rollback usually degrades performance vs fine-grained locking.
- Lock-insertion alternative: some STMs implement
atomicby auto-inserting locks, but finding all accesses and ordering the acquisitions to avoid deadlock is hard.
Modern descendants
| Era | Instruction | Scope |
|---|---|---|
| VAX | INSQUE/REMQUE | atomic linked-list ops |
| IBM 370 | CS, CDS | single- and double-word CAS |
| x86 | LOCK CMPXCHG, CMPXCHG16B | single/double CAS |
| ARM/PowerPC | LL/SC | reservation-based RMW |
| Intel TSX | XBEGIN/XEND | hardware transactional memory (multi-word) |