Locality-Sensitive Hashing

Stuff Made Here introduced me to this in his jigsaw video.

In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same “buckets” with high probability.

(Scalar) LSH to find Near Neighbours

  1. Make a Hash Table
  2. Use buckets that are wide (and overlapped, so items go in multiple buckets)
  3. Most values s.t. < c will have at least 1 bucket in common
  4. Most values > c will NOT share a common bucket

Okay, so how do we construct this hash function??

  • If C1 and C2 are highly similar, then with high probability: • h(C1 ) = h(C2)
  • If C1 and C2 are highly dissimilar, then with high probability: • h(C1) ≠ h(C2)