M/M/1 Queue

The simplest nontrivial queue: Poisson arrivals (rate ), exponential service times (rate ), one server, FIFO, infinite capacity.

Utilization:

Steady-state results (memorize these for queueing questions):

Where / include the one in service; / exclude it. All follow from Little + the geometric distribution of queue length.

The “hockey stick”: plot vs . Linear up to about , vertical past 0.9. That’s why systems in production typically target around 0.5–0.7; the latency penalty for pushing utilization higher is brutal.

The exponential service-time assumption is unrealistic; real service times have more variance than exponential. M/G/1 (general service distribution) captures this via the Pollaczek–Khinchine formula, where latency scales with the variance of service time, not just the mean.

From ECE459 L32

Markov = memoryless: arrivals Poisson, inter-arrivals exponential, service exponential. Utilization (arrival rate × service time).

Jeff’s working formulas [Wil13b]:

  • : average completion time (wait + service).
  • : average queue length.
  • : pure waiting time.

Jobs-in-system: . Prob of exactly jobs: . Prob : .

Example [Wil13b]

Server: ms, exponential. Over 30 min, 117 000 jobs. req/s, so .

  • ms.
  • jobs.

Printer example [Wil13b]

min, jobs every 2.5 min ⇒ , . min. Ouch.

Buy a 2× faster printer? , , min (1:40). Much better, when utilization is low. Compare against the two-printer solution in k.