Barrier Synchronization

Fetch Increment Barrier

A fetch-increment barrier is the simplest reusable barrier: all N tasks spin on a shared flag after bumping a count with fetch-and-increment. It is spinning, requires T == G (one thread per group slot), and uses a toggle flag instead of a per-cycle reset to avoid the reinitialization race.

Why a flag toggle rather than count == 0?

Because if the wake condition were count == 0, a slow waiter in cycle i could see count reach 0, resume, race back into the barrier for cycle i+1, and observe count == 0 again before the fast tasks even arrived. The toggle flag flips every cycle, and each waiter captures the flag’s negation locally. You exit when flag == negflag, which only happens once the Nth task toggles it.

Implementation

struct Barrier {
    size_t group = 0;
    volatile bool flag = false;
    volatile size_t count = 0;
};
 
void block( Barrier & b ) {
    size_t negflag = ! b.flag;                   // capture expected flag for this cycle
    if ( FetchInc( b.count, 1 ) < b.group - 1 ) {
        await( b.flag == negflag );              // spin until Nth arrives
    } else {
        // SAFE ACTION BEFORE TRIGGERING BARRIER
        b.count = 0;                             // reset for next cycle
        b.flag  = negflag;                       // flip: releases all spinners
    }
}

The atomic FetchInc returns the count before the increment. If it is less than group - 1, this task is not the last, so spin. Otherwise, this is the Nth task: reset count and flip the flag, which unblocks everyone at once.

Why local negflag

Each task computes negflag = !b.flag before its fetch-increment. This freezes a per-cycle target without needing extra state in the barrier. The Nth task flips b.flag to match, and waiters observe the flip.

Failure of await( b.count == 0 )

If the spin condition were count == 0, a task could spin through one full cycle and a partial next cycle without ever seeing the release. Concretely: Nth task resets count = 0, a fast waiter wakes, runs past the barrier, loops back, and enters again. It sees count = 1 on its own arrival, then a later task pushes it back down. The check is unreliable.

Spinning vs blocking

Fetch-increment barriers spin on the flag, wasting CPU while waiting. They are right when the per-phase work is roughly balanced across tasks so waits are short. For unbalanced phases use uBarrier, which blocks instead of spins.