Queueing Theory
The mathematics of waiting lines. Given an arrival process, a service process, and some number of servers, predicts steady-state metrics: average queue length, wait time, utilization. From ECE459 L31.
Why care?
Shows up everywhere queues do: call centres, routers, supermarket tills, CPU schedulers. Itās how telcos decide staffing levels. The non-obvious payoff: the latency-vs-utilization curve is a hockey stick, not a line, which is why over-provisioning isnāt wasteful but is literally the cost of predictable latency.
Kendall notation: A/S/c/K/N/D
- A: arrival process (M = Markov/Poisson, D = deterministic, G = general)
- S: service-time distribution
- c: number of servers
- K: queue capacity (default ā)
- N: population size (default ā)
- D: scheduling discipline (default FIFO)
Common models: 1, k, M/M/1/K, M/G/1.
Key quantities
- : arrival rate
- : service rate per server
- : utilization, must be < 1 for stability
- : average number of items in system
- : average time in system
- Littleās Law: , holds for any stable system
Vocabulary [Liu09]
- Server (teller), customer (requester)
- Wait time (in line), service time (at teller), response time = wait + service
- Residence time: response across multiple visits
- Throughput: rate of completed requests
Stability:
If , queue length grows without bound [HB13]:
Averages, not instant by instant. Short overshoots recover.
The hockey stick
Plot vs . Linear up to , vertical past . A server at 99% utilization queues ~100Ć longer than at 90%. Production systems target .
Doubling both
\lambdaand\muhalves response timeNot unchanged, halved. If the boss doubles arrival rate, the CPU doesnāt need to be 2Ć faster to preserve response time, it needs less. is sensitive to the gap, not the ratio.
One fast server vs. many slow ones
Non-preemptable jobs, it depends:
- High variability in job sizes: many servers, āthe guy with 85 itemsā doesnāt block the milk-and-eggs line
- Low load: one fast server, donāt leave slower ones idle
- Preemptible jobs: one fast server simulates slow ones
Closed vs. open systems
- Open: arrivals independent of departures. Server upgrades help if youāre not already near bottleneck
- Closed: fixed jobs in flight. The bottleneck device dominates; upgrading a non-bottleneck server can do nothing until you raise or fix the bottleneck. Intuition from open systems fails here
Improving doesnāt always improve throughput
If arrivals are the limit, adding service capacity raises maximum throughput, not realized throughput. āComplete six assignments but the prof only gave you four.ā In closed/batch systems throughput tracks directly because thereās always work.
Measuring honestly [HB13]
Naive single-job benchmarks miss caching and concurrency effects:
- Open system: ramp until completion rate flatlines, the plateau is
- Closed system: drive think time to zero (always-on workload), measure completions/sec
Related
- 1
- k
- Queuing Network Model
- Queueing Complications (balking, reneging, priority, FastPass)
- Littleās Law
- Load Balancing