Kogge-Stone Adder
A parallel prefix-sum algorithm that reaches O(log N) step complexity by doubling the stride each iteration: at step s every active lane combines with the element 2^s back. Introduced to me through the PMPP book.
Kogge-Stone vs sequential and vs Brent-Kung?
- Sequential scan: O(N) work, O(N) steps
- Kogge-Stone: O(log N) steps, O(N log N) work. Step-efficient but not work-efficient (does more ops than the serial version)
- Brent-Kung: O(log N) steps, O(N) work via up-sweep then down-sweep. Work-efficient, roughly 2× the critical path of Kogge-Stone
On a GPU, the right choice depends on whether you are compute-bound or latency-bound. See MIT 6.S894 Lec 7.
The name comes from the original 1973 hardware adder by Peter Kogge and Harold Stone, where the same tree structure was used to propagate carries in parallel. The algorithm generalizes to any associative operator.