Scalability

Scalability is the trend of performance as load increases, i.e. a program’s ability to meet its designed throughput (e.g. x transactions per hour) as x grows.

Why is it separate from raw performance?

A fast program on one user may collapse at 1000. Scalability asks how performance deteriorates with load. If it degrades rapidly, no amount of tuning on the hot path saves you (“rearranging deck chairs on the Titanic”).

Also learned in SE464.

Common metrics

Requests or queries per second

Concurrent users

Monthly active users

Scaling challenges

Why scaling is hard even with unlimited money:

A single machine has a finite clock and finite RAM
Coordinating multiple machines is hard (who does what?)
Sharing data across machines (where does it live, how do concurrent writes resolve?)
More machines means higher probability at least one fails
Users are globally distributed, so communication latency compounds
Components must trust each other but reject attackers (authentication)
Software updates must deploy without downtime

Two approaches

Vertical scaling, upgrade the machine
- Simplest and most efficient, but hits a hardware ceiling
- On cloud, often just a VM reboot
Horizontal scaling, add more machines
- Coordinating a cluster is complicated, but necessary for global scale

Cloud compute is called “elastic” because you can change size and quantity quickly.

Simple scaling math

With N nodes, capacity C per node, total request rate R:

Scalable: each request goes to one node, $R_{ma x} = NC$ . If it hits k constant nodes, $R_{ma x} = NC / k = O (NC)$
Not scalable: each request hits all nodes, $R_{ma x} = C$

where:

$N$ is the number of nodes
$C$ is per-node capacity (requests/sec)
$R_{ma x}$ is the highest sustainable total request rate

🛠️ Steven Gong

Table of Contents

Scalability

Scaling challenges

Two approaches

Simple scaling math

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

Scalability

Scaling challenges

Two approaches

Simple scaling math

Related

Graph View

Backlinks