Barrier Synchronization

uBarrier

uBarrier is uC++‘s reusable N-way rendezvous primitive. N tasks each call block(); the first N−1 wait, the Nth releases them all, and the barrier resets for the next round. Unlike a one-shot latch, uBarrier is implemented as a coroutine so per-cycle state (count, generation) lives on its stack.

Why a coroutine and not a plain counter + semaphore?

Because reusability needs a generation: after the Nth arrival signals the others, the next cycle’s arrivers must not slip through the signal meant for the previous cycle. A coroutine holds the generation naturally in local state across suspend/resume, without the races a raw counter+semaphore introduces.

Interface

class uBarrier {
public:
    uBarrier( unsigned int total );
    unsigned int total() const;            // total N
    unsigned int waiters() const;          // currently blocked
    void reset( unsigned int total );      // change N between cycles
    virtual void block();                  // the rendezvous call
    virtual void last();                   // hook: runs on Nth arrival, before release
};

Usage

uBarrier b( 4 );
_Task Worker {
    void main() {
        for ( int phase = 0; phase < 3; phase += 1 ) {
            doPhase( phase );
            b.block();                     // wait for all 4 workers
        }
    }
};

Accumulator, matrix-sum via block() override (Buhr §6.3.3.2)

Classic uBarrier application: each worker adds one row’s subtotal, then blocks. The barrier’s last() hook captures the total in order-of-arrival rather than order-of-termination.

_Cormonitor Accumulator : public uBarrier {
    int total_ = 0, temp;
    uBaseTask * Gth_ = nullptr;
  protected:
    void last() {                              // runs on Nth arriver
        temp = total_; total_ = 0;             // snapshot + reset for reuse
        Gth_ = &uThisTask();                   // remember who finished last
    }
  public:
    Accumulator( int rows ) : uBarrier( rows ) {}
    void block( int subtotal ) {
        total_ += subtotal;                    // deposit before barrier
        uBarrier::block();                     // wait for all adders
    }
    int total() { return temp; }
    uBaseTask * Gth() { return Gth_; }
};
 
_Task Adder {
    int * row, size;
    Accumulator & acc;
    void main() {
        int subtotal = 0;
        for ( unsigned int r = 0; r < size; r += 1 ) subtotal += row[r];
        acc.block( subtotal );                 // contribute + rendezvous
    }
  public:
    Adder( int row[], int size, Accumulator & acc )
        : row(row), size(size), acc(acc) {}
};
 
int main() {
    enum { rows = 10, cols = 10 };
    int matrix[rows][cols];
    Accumulator acc( rows );                   // barrier for `rows` tasks
    {
        uArray( Adder, adders, rows );
        for ( unsigned int r = 0; r < rows; r += 1 )
            adders[r]( matrix[r], cols, acc );
    } // wait adders
    cout << acc.total() << " " << acc.Gth() << endl;
}

Key points:

  • Override block( int ) to accept per-task data before calling uBarrier::block().
  • last() runs after all N arrived, before any resumes, safe place to snapshot and zero for reuse.
  • Coroutine state persists across cycles: the same Accumulator can be reused by reading a new matrix each round.

last() hook

Override to perform an action after all N arrive but before any is released, e.g., aggregate per-phase results, flip a double-buffer, log. It runs once per cycle on the thread of the last arriver.