Locality-Sensitive Hashing (LSH) is the retrieval mechanism that makes AGISystem2 fast. Instead of comparing a query against every stored concept, LSH groups similar vectors into buckets. Only concepts in the same bucket need exact comparison, reducing search from O(n) to approximately O(1).
LSH maps similar vectors to the same bucket. Instead of comparing the query against all 10,000 concepts, we only compare against the 4 concepts in bucket 42. Similar vectors hash to nearby buckets with high probability.
LSH uses random hyperplane projections to create hash codes. Each bit of the hash is determined by which side of a random hyperplane the vector falls on. Similar vectors tend to fall on the same side of most hyperplanes, producing similar hash codes and landing in the same or adjacent buckets.
The Retriever maintains multiple hash tables with different random projections. This reduces false negatives: if two similar vectors end up in different buckets in one table, they're likely to share a bucket in another table.
Retrieval happens in two stages:
This combines LSH's speed (sublinear candidate selection) with exact distance accuracy (precise ranking of candidates).
The Retriever component manages LSH tables:
| Operation | Description |
|---|---|
index(vector, id) |
Add vector to all hash tables |
query(vector, k) |
Find k nearest neighbors |
remove(id) |
Remove from all tables |
rehash() |
Rebuild tables (after many changes) |
LSH parameters are set via configuration profiles:
| Parameter | auto_test | prod | Effect |
|---|---|---|---|
| numTables | 4 | 16 | More tables = fewer false negatives |
| hashBits | 8 | 12 | More bits = smaller buckets |
| probeRadius | 1 | 2 | Check adjacent buckets too |