Locality-Sensitive Hashing
Stuff Made Here introduced me to this in his jigsaw video.
In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same “buckets” with high probability.
(Scalar) LSH to find Near Neighbours
- Make a Hash Table
- Use buckets that are wide (and overlapped, so items go in multiple buckets)
- Most values s.t. < c will have at least 1 bucket in common
- Most values > c will NOT share a common bucket
Okay, so how do we construct this hash function??
- If C1 and C2 are highly similar, then with high probability: • h(C1 ) = h(C2)
- If C1 and C2 are highly dissimilar, then with high probability: • h(C1) ≠ h(C2)