Fetch Increment Barrier
A fetch-increment barrier is the simplest reusable barrier: all N tasks spin on a shared flag after bumping a count with fetch-and-increment. It is spinning, requires T == G (one thread per group slot), and uses a toggle flag instead of a per-cycle reset to avoid the reinitialization race.
Why a flag toggle rather than
count == 0?Because if the wake condition were
count == 0, a slow waiter in cycleicould seecountreach 0, resume, race back into the barrier for cyclei+1, and observecount == 0again before the fast tasks even arrived. The toggle flag flips every cycle, and each waiter captures the flag’s negation locally. You exit whenflag == negflag, which only happens once the Nth task toggles it.
Implementation
struct Barrier {
size_t group = 0;
volatile bool flag = false;
volatile size_t count = 0;
};
void block( Barrier & b ) {
size_t negflag = ! b.flag; // capture expected flag for this cycle
if ( FetchInc( b.count, 1 ) < b.group - 1 ) {
await( b.flag == negflag ); // spin until Nth arrives
} else {
// SAFE ACTION BEFORE TRIGGERING BARRIER
b.count = 0; // reset for next cycle
b.flag = negflag; // flip: releases all spinners
}
}The atomic FetchInc returns the count before the increment. If it is less than group - 1, this task is not the last, so spin. Otherwise, this is the Nth task: reset count and flip the flag, which unblocks everyone at once.
Why local negflag
Each task computes negflag = !b.flag before its fetch-increment. This freezes a per-cycle target without needing extra state in the barrier. The Nth task flips b.flag to match, and waiters observe the flip.
Failure of await( b.count == 0 )
If the spin condition were count == 0, a task could spin through one full cycle and a partial next cycle without ever seeing the release. Concretely: Nth task resets count = 0, a fast waiter wakes, runs past the barrier, loops back, and enters again. It sees count = 1 on its own arrival, then a later task pushes it back down. The check is unreliable.
Spinning vs blocking
Fetch-increment barriers spin on the flag, wasting CPU while waiting. They are right when the per-phase work is roughly balanced across tasks so waits are short. For unbalanced phases use uBarrier, which blocks instead of spins.