FLOPS

Floating point operations per second. Measure of compute throughput, useful in fields of scientific computations that require floating-point calculations.

What does a modern ML workload cost in FLOPs?

Training a 1B-parameter LLaMA on 30B tokens at the standard 6 FLOPs/param/token rule of thumb (forward + backward pass, Kaplan scaling law accounting):

On ~8 H100s (each ~1 PFLOPS sustained bf16), that finishes in about a day. Rule of thumb: 6 FLOPs/param/token for transformer pretraining (MIT 6.S894 Lec 1, slide 5).

Unit ladder

  • MFLOPS (a Cray-1 in 1975 hit 160 MFLOPS)
  • GFLOPS (a naive C matmul on a laptop)
  • TFLOPS (a consumer GPU)
  • PFLOPS (a single H100 at bf16)
  • EFLOPS (training budget territory)

TOPS

TOPS = Tera Operations per Second. Same order as TFLOPS but usually reported for integer or low-precision ops on inference accelerators (Neural Engine, Tensor Cores, edge TPUs). Common spec on NVIDIA product sheets.

https://semiengineering.com/tops-memory-throughput-and-inference-efficiency/