Scalability
Scalability is the trend of performance as load increases, i.e. a program’s ability to meet its designed throughput (e.g. x transactions per hour) as x grows.
Why is it separate from raw performance?
A fast program on one user may collapse at 1000. Scalability asks how performance deteriorates with load. If it degrades rapidly, no amount of tuning on the hot path saves you (“rearranging deck chairs on the Titanic”).
Also learned in SE464.
Common metrics
- Requests or queries per second
- Concurrent users
- Monthly active users
Scaling challenges
Why scaling is hard even with unlimited money:
- A single machine has a finite clock and finite RAM
- Coordinating multiple machines is hard (who does what?)
- Sharing data across machines (where does it live, how do concurrent writes resolve?)
- More machines means higher probability at least one fails
- Users are globally distributed, so communication latency compounds
- Components must trust each other but reject attackers (authentication)
- Software updates must deploy without downtime
Two approaches
- Vertical scaling, upgrade the machine
- Simplest and most efficient, but hits a hardware ceiling
- On cloud, often just a VM reboot
- Horizontal scaling, add more machines
- Coordinating a cluster is complicated, but necessary for global scale
Cloud compute is called “elastic” because you can change size and quantity quickly.
Simple scaling math
With N nodes, capacity C per node, total request rate R:
- Scalable: each request goes to one node, . If it hits k constant nodes,
- Not scalable: each request hits all nodes,
where:
- is the number of nodes
- is per-node capacity (requests/sec)
- is the highest sustainable total request rate