Load Testing
Exercising a system under expected workload to answer “can we handle load X?” or “what is the maximum Y we can handle?“. Not the same as stress testing, which cranks pressure until things break.
Why?
Scalability goals (1 user to 100 to 10 million) require numbers, not hunches. C-level asks “can we handle 10x users?” need evidence-backed answers.
Two workload schedules:
- Steady load: constant arrival rate held for the duration
- Stepwise load: incrementally increasing rate in discrete steps
Start with why [Mel21]
The reason drives the design:
- New system: establish average workload + safety buffer
- Expected growth (10x users): find bottlenecks
- Spike (tax season, Black Friday): hard part is simulating the spike
- High uptime (99.99%): endurance test on top of load test
- “Performance testing” is on the checklist with no real reason: bail
What to test
- Not 100% coverage. Start with what observability flags as slow, or the critical path per product requirements
- Compute-heavy workflows, UX-sensitive flows (signup > 2s and users quit), hard external deadlines (1s to approve/decline)
- Low current utilization means you have to guess the rate-limiting step and revise as you ramp up
How to test
Hardware principle [Liu09]
Test on production-equivalent hardware. A 16 GB laptop is not a 128 GB server, and limiting factors differ wildly. Otherwise you waste time optimizing RAM when RAM is not the problem.
Reality principle
Use real workload shapes. Legal may block real customer data, so use the best approximation.
War story (JZ)
A DB migration timed out on prod because the test DB was much smaller. “Managers run the report monthly” turned into “managers run it hourly to watch their team.” Plan types, customer sizes, entity counts all matter.
Volume principle
“More is the new more.” You cannot fake real pressure. Faking CPU pressure (encoding video in a loop) does not reveal lock contention that only shows up with 500 real users.
Reproducibility
Two runs on the same code should produce similar results. Unlike unit tests, load tests have real randomness (generated data, scheduler, luck), so aim for similarity, not identity.
Endurance tests
Running analogy
Jeff can run 10 km/h for 1 hour, not 4. Looking at a 15-min sample you’d conclude he could run forever.
CPUs don’t get tired, but software accumulates “fatigue”: memory leaks (java.lang.OutOfMemoryError), swap thrashing, file handle exhaustion, disk fill, log growth.
Holiday freeze (JZ)
Services ran unrestarted long enough to hit internal-resource limits. Fix was rolling restarts. This is an endurance problem even at low load.
Picking duration has no universal rule. Use product requirements (e-commerce: 5 days across Thanksgiving to Cyber Monday) or maintenance windows (SLA allows downtime Sun 02:00 to 03:00, so must survive at least a week).
Evaluating success
Raw results rarely suffice. Post-process, aggregate, correlate with external factors. Criteria:
- Total work completed within total time limit?
- Individual item time met 99% of the time?
For endurance, look at the trend: does the “yes” stay yes across the whole window?
When you fail
Apply course techniques to the specific slow scenario, re-test, repeat. Software has limits (Jeff will not beat Kipchoge’s 2:01:09). If you have hit the wall, consider redesign, or rethink the constraints: don’t bill all customers on the same day if you can spread billing across the month.
Constant vigilance
Repeat load tests regularly. Software grows in complexity faster than hardware improves (in the current era), so catch regressions before prod does. Example: https://arewefastyet.com tracks Firefox perf.