Bloom Filter

Introduced in CS451.

Problem Setup

We want to check if a set contains . However, this set is too large to hold in memory.

Solution: a bit-vector of bits, and unique hash functions.

I don't understand how this is different from a hash map??

Bloom filters are much more memory efficient.

  • Bloom filters use a fixed-size bit array and multiple hash functions to represent membership

False Positives

Bloom Filter: Can have false positives, meaning it might incorrectly say an element exists in the set when it doesn’t.

  • However, it will never have false negatives—if the filter says an element doesn’t exist, it definitely doesn’t

Bloom Filter also does not support deletion because flipping bits back in the bit array could interfere with other elements.

Example

Bloom Filters has No false negatives. If it’s been seen, all of it’s bits are set to 1

  • False positives: The more unique elements seen, the more bits are set to 1.
  • Can tune false positive rate by adjusting values for and