FLOPS

Floating point operations per second. Measure of compute throughput, useful in fields of scientific computations that require floating-point calculations.

What does a modern ML workload cost in FLOPs?

Training a 1B-parameter LLaMA on 30B tokens at the standard 6 FLOPs/param/token rule of thumb (forward + backward pass, Kaplan scaling law accounting):

$1 \times 1 0^{9} params \times 3 \times 1 0^{10} tokens \times 6 \approx 1.8 \times 1 0^{20} FLOPs = 180 ExaFLOPS$

On ~8 H100s (each ~1 PFLOPS sustained bf16), that finishes in about a day. Rule of thumb: 6 FLOPs/param/token for transformer pretraining (MIT 6.S894 Lec 1, slide 5).

Unit ladder

MFLOPS $= 1 0^{6}$ (a Cray-1 in 1975 hit 160 MFLOPS)
GFLOPS $= 1 0^{9}$ (a naive C matmul on a laptop)
TFLOPS $= 1 0^{12}$ (a consumer GPU)
PFLOPS $= 1 0^{15}$ (a single H100 at bf16)
EFLOPS $= 1 0^{18}$ (training budget territory)

TOPS

TOPS = Tera Operations per Second. Same order as TFLOPS but usually reported for integer or low-precision ops on inference accelerators (Neural Engine, Tensor Cores, edge TPUs). Common spec on NVIDIA product sheets.

https://semiengineering.com/tops-memory-throughput-and-inference-efficiency/

🛠️ Steven Gong

Table of Contents

FLOPS

Unit ladder

TOPS

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

FLOPS

Unit ladder

TOPS

Related

Graph View

Backlinks