Reservoir Sampling
This was an algo introduced in CS451.
Problem: Select S values uniformly at random from a stream of N values Assumption: N is very big, and not known ahead of time (it’s a stream!)
Solution:
- Store first S values
- When receiving kth keep it with probability S/k
- If keeping it, randomly discard one of the existing S elements to make room