FLOPS
Floating point operations per second. Measure of compute throughput, useful in fields of scientific computations that require floating-point calculations.
What does a modern ML workload cost in FLOPs?
Training a 1B-parameter LLaMA on 30B tokens at the standard 6 FLOPs/param/token rule of thumb (forward + backward pass, Kaplan scaling law accounting):
On ~8 H100s (each ~1 PFLOPS sustained bf16), that finishes in about a day. Rule of thumb: 6 FLOPs/param/token for transformer pretraining (MIT 6.S894 Lec 1, slide 5).
Unit ladder
- MFLOPS (a Cray-1 in 1975 hit 160 MFLOPS)
- GFLOPS (a naive C matmul on a laptop)
- TFLOPS (a consumer GPU)
- PFLOPS (a single H100 at bf16)
- EFLOPS (training budget territory)
TOPS
TOPS = Tera Operations per Second. Same order as TFLOPS but usually reported for integer or low-precision ops on inference accelerators (Neural Engine, Tensor Cores, edge TPUs). Common spec on NVIDIA product sheets.
https://semiengineering.com/tops-memory-throughput-and-inference-efficiency/