Barrier Synchronization
Saw this while reading the PMPP book, chapter 3.
A barrier for a group of threads or processes in the source code means any thread/process must stop at this point and cannot proceed until all other threads/processes reach this barrier.

Reusable barriers
A one-shot barrier is just a latch: count down to 0, release, done. A reusable barrier resets each cycle, which needs a generation counter so late arrivers from cycle N donβt slip through into cycle N+1. See uBarrier for the uC++ coroutine-based implementation.
CS343, 3-task ordered example (Buhr Β§6.3.3)
Three tasks must execute S1/S2/S3 before any runs S5/S6/S7:
T1::main() { T2::main() { T3::main() {
... ... ...
S1 S2 S3
b.block(); b.block(); b.block(); // gather point
S5 S6 S7
} } }
int main() {
Barrier b( 3 ); // 3 tasks must arrive
T1 x( b ); T2 y( b ); T3 z( b );
}The barrier blocks the first Nβ1 callers; the Nth task trips it and all release together. Total arrivals must be declared up front.
One-shot vs. cyclic
βββββ βββββ βββββ
β β one-shot β β β β cyclic
β β β β β β
βββββ βββββ βββββ
start start end start end
Cyclic reuse is where the reinitialization problem bites, a fast thread racing ahead into cycle N+1 before a slow cycle-N thread has left the barrier.
Coordinator / Workers two-barrier pattern (Β§6.3.3)
Two barriers let the coordinator accumulate results while workers re-initialize for the next round, without either side stalling the other:
Barrier start( G + 1 ), end( G + 1 ); // G workers + 1 coordinator
// Coordinator // Worker
// general initialization // worker initialize
start.block(); start.block(); // all ready
// do other work (or collect subtotals) // do work (produce subtotal)
end.block(); end.block(); // all done
// close down / loop // close down or reinitAlternative: last worker does coordination, but then workers canβt re-initialize during coordination. Two barriers is cleaner.
Why not spawn fresh tasks each cycle? Task creation/deletion is expensive, so barriers on reusable workers beat it.