Sparse Polynomial HDC (SPHDC) represents concepts as small finite sets of integers in a virtually infinite space. Instead of fixed-length bit vectors, each concept is a set of k elements drawn from the integers modulo 264.
In SPHDC, a concept is not a binary pattern but a finite set of integers. The "polynomial" in the name comes from viewing each integer as an exponent in a polynomial over GF(2), though the implementation directly uses set operations.
An SPVector V is a finite set of k distinct 64-bit integers:
V = { e0, e1, ..., ek-1 } where ei ∈ Z264
The name comes from the mathematical equivalence between sets of exponents and polynomials over GF(2):
However, SPHDC uses a different binding operation (Cartesian product) that preserves structure better than pure polynomial XOR.
The parameter k is the number of integers in each SPVector. It controls the trade-off between:
| k Value | Memory | Binding Ops | Interpretation |
|---|---|---|---|
| k=1 | 8 bytes | 1 | Single hash per concept (minimal) |
| k=2 | 16 bytes | 4 | Pair of identifiers |
| k=4 (default) | 32 bytes | 16 | Balanced: distinct yet compact |
| k=8 | 64 bytes | 64 | Higher fidelity representations |
| k=16 | 128 bytes | 256 | Rich similarity gradients |
The binding operation in SPHDC creates associations between concepts using the Cartesian product with XOR.
Given sets A and B, their binding A BIND B is:
A BIND B = { a xor b | a ∈ A, b ∈ B }
Every element of A is XORed with every element of B, producing |A| × |B| results.
XOR is its own inverse: x xor x = 0. This gives Cartesian XOR binding a crucial property:
(A BIND B) BIND B ≈ A
Binding with B, then binding again with B, approximately recovers A. This enables reversible encoding of structured information.
A = {5, 9}
B = {3, 7}
A BIND B = {5 xor 3, 5 xor 7, 9 xor 3, 9 xor 7}
= {6, 2, 10, 14}
(A BIND B) BIND B = {6, 2, 10, 14} BIND {3, 7}
= {6 xor 3, 6 xor 7, 2 xor 3, 2 xor 7, ...}
= {5, 1, 1, 5, 9, 13, 13, 9, ...}
After removing duplicates and sparsifying: {5, 9, ...}
The original A elements are recovered!
When binding two k-sets, the result has up to k2 elements. To maintain constant memory, we sparsify back to k elements.
Given a set S with |S| > k, select the k elements with the smallest hash values:
sparsify(S, k) = k elements of S with min(hash(e))
Min-Hash has a remarkable property: the probability that two sets share the same minimum hash equals their Jaccard similarity. This means sparsification preserves the essential similarity structure.
Similarity between SPVectors is measured by how much their sets overlap.
sim(A, B) = |A ∩ B| / |A ∪ B|
The ratio of shared elements to total unique elements. Range: [0, 1].
A = {1, 5, 9, 12} (k=4)
B = {5, 9, 15, 20} (k=4)
Intersection: A ∩ B = {5, 9} |A∩B| = 2
Union: A ∪ B = {1, 5, 9, 12, 15, 20} |A∪B| = 6
Jaccard: sim(A, B) = 2/6 = 0.333
Each concept gets a deterministic SPVector generated from its name.
seed = DJB2("Dog")Structured knowledge is encoded by binding concepts with position vectors.
fact(loves, John, Mary) = Loves BIND (Pos1 BIND John) BIND (Pos2 BIND Mary)
Position vectors (Pos1, Pos2, ...) are pre-defined constants that distinguish argument positions.
To answer queries, we "unbind" the known parts to recover the unknown.
Since XOR-based binding cancels known parts, to find ?who in "loves(?who, Mary)":
candidate = KB BIND Loves BIND (Pos2 BIND Mary)
Then match candidate against vocabulary to find the answer.
# KB contains: loves(John, Mary) encoded as fact_vector
# Query: Who loves Mary? (loves(?who, Mary))
partial_query = Loves BIND (Pos2 BIND Mary)
# Unbind from KB
candidate = KB BIND partial_query
= fact_vector BIND Loves BIND (Pos2 BIND Mary)
= (Pos1 BIND John) + noise
# Extract Pos1
raw_answer = candidate BIND Pos1 ≈ John + noise
# Match against vocabulary
best_match = argmax(sim(raw_answer, vocab[name]))
= "John"