M/M/1 Queue
The simplest nontrivial queue: Poisson arrivals (rate ), exponential service times (rate ), one server, FIFO, infinite capacity.
Utilization:
Steady-state results (memorize these for queueing questions):
Where / include the one in service; / exclude it. All follow from Little + the geometric distribution of queue length.
The “hockey stick”: plot vs . Linear up to about , vertical past 0.9. That’s why systems in production typically target around 0.5–0.7; the latency penalty for pushing utilization higher is brutal.
The exponential service-time assumption is unrealistic; real service times have more variance than exponential. M/G/1 (general service distribution) captures this via the Pollaczek–Khinchine formula, where latency scales with the variance of service time, not just the mean.
From ECE459 L32
Markov = memoryless: arrivals Poisson, inter-arrivals exponential, service exponential. Utilization (arrival rate × service time).
Jeff’s working formulas [Wil13b]:
- : average completion time (wait + service).
- : average queue length.
- : pure waiting time.
Jobs-in-system: . Prob of exactly jobs: . Prob : .
Example [Wil13b]
Server: ms, exponential. Over 30 min, 117 000 jobs. req/s, so .
- ms.
- jobs.
Printer example [Wil13b]
min, jobs every 2.5 min ⇒ , . min. Ouch.
Buy a 2× faster printer? , , min (1:40). Much better, when utilization is low. Compare against the two-printer solution in k.