Locality-Sensitive Hashing (LSH) is the retrieval mechanism that makes AGISystem2 fast. Instead of comparing a query against every stored concept, LSH groups similar vectors into buckets. Only concepts in the same bucket need exact comparison, reducing search from O(n) to approximately O(1).

Query vector Hash h(v) → bucket = 42 Hash Buckets Bucket 41: Bucket 42: A B C D ← Only these 4 need exact comparison Bucket 43: Bucket 44: Total concepts: 10,000 Bucket 42 size: 4 Speedup: 2,500×

LSH maps similar vectors to the same bucket. Instead of comparing the query against all 10,000 concepts, we only compare against the 4 concepts in bucket 42. Similar vectors hash to nearby buckets with high probability.

How It Works

LSH uses random hyperplane projections to create hash codes. Each bit of the hash is determined by which side of a random hyperplane the vector falls on. Similar vectors tend to fall on the same side of most hyperplanes, producing similar hash codes and landing in the same or adjacent buckets.

The Retriever maintains multiple hash tables with different random projections. This reduces false negatives: if two similar vectors end up in different buckets in one table, they're likely to share a bucket in another table.

Two-Stage Retrieval

Retrieval happens in two stages:

  1. Candidate selection – Hash the query, fetch all vectors in matching buckets across all tables
  2. Exact ranking – Compute masked L1 distance for candidates, return top-k

This combines LSH's speed (sublinear candidate selection) with exact distance accuracy (precise ranking of candidates).

Implementation

The Retriever component manages LSH tables:

OperationDescription
index(vector, id) Add vector to all hash tables
query(vector, k) Find k nearest neighbors
remove(id) Remove from all tables
rehash() Rebuild tables (after many changes)

Configuration

LSH parameters are set via configuration profiles:

Parameterauto_testprodEffect
numTables 4 16 More tables = fewer false negatives
hashBits 8 12 More bits = smaller buckets
probeRadius 1 2 Check adjacent buckets too

Trade-offs

Related Documentation