Load Balancing
Distributing incoming work across multiple servers so no single one is overwhelmed. Covered formally in SE464.
Resources:
- AWS: What is load balancing?
- AWS Elastic Load Balancing
- Mastering Chaos: A Netflix Guide to Microservices
- Wikipedia: Load balancing (computing)
From ECE459 L31
Typical “server farm” shape: approximately identical servers behind a dispatcher that assigns jobs. Alternative: work-stealing, post-hoc reassignment when one queue piles up.
Task-assignment policies [HB13]
- Random: exactly what it sounds like
- Round-robin: job goes to server
- Shortest-Queue: pick the server with the shortest queue
- Size-Interval-Task-Assignment (SITA): short jobs to one server, medium to another, long to a third
- Least-Work-Left: pick the server with the smallest sum of remaining work
- Central-Queue: servers pull from a shared queue when idle (no direct assignment)
Which minimizes mean response time? Nobody knows. It depends on job-size variability and other factors; under-studied. Jeff: “PhD, anyone?“.