Reservoir Sampling

This was an algo introduced in CS451.

Problem: Select S values uniformly at random from a stream of N values Assumption: N is very big, and not known ahead of time (it’s a stream!)

Solution:

  • Store first S values
  • When receiving kth keep it with probability S/k
  • If keeping it, randomly discard one of the existing S elements to make room