Sparse Polynomial HDC (SPHDC) represents concepts as small finite sets of integers in a virtually infinite space. Instead of fixed-length bit vectors, each concept is a set of k elements drawn from the integers modulo 264.

Read the full theoretical analysis →

1. The Core Idea: Sets as Representations

Fundamental Insight

In SPHDC, a concept is not a binary pattern but a finite set of integers. The "polynomial" in the name comes from viewing each integer as an exponent in a polynomial over GF(2), though the implementation directly uses set operations.

Definition: SPVector

An SPVector V is a finite set of k distinct 64-bit integers:

V = { e0, e1, ..., ek-1 } where ei ∈ Z264

Why "Polynomial"?

The name comes from the mathematical equivalence between sets of exponents and polynomials over GF(2):

# A set of exponents {3, 7, 12} represents the polynomial: P(x) = x^3 + x^7 + x^12 (coefficients in GF(2), so + is XOR) # XOR of two sets = XOR of polynomials {3, 7, 12} XOR {5, 7} = {3, 5, 12} (7 cancels: 7 XOR 7 = 0) = x^3 + x^5 + x^12

However, SPHDC uses a different binding operation (Cartesian product) that preserves structure better than pure polynomial XOR.

2. The k Parameter

Definition: k (Set Cardinality)

The parameter k is the number of integers in each SPVector. It controls the trade-off between:

Intuition: Think of k as the "vocabulary size" for each concept. With k=4, each concept is described by exactly 4 unique "words" (64-bit integers). More words = richer description, but more computation.

Understanding k Values

k Value Memory Binding Ops Interpretation
k=1 8 bytes 1 Single hash per concept (minimal)
k=2 16 bytes 4 Pair of identifiers
k=4 (default) 32 bytes 16 Balanced: distinct yet compact
k=8 64 bytes 64 Higher fidelity representations
k=16 128 bytes 256 Rich similarity gradients

3. Binding: Cartesian XOR

The binding operation in SPHDC creates associations between concepts using the Cartesian product with XOR.

Definition: Cartesian XOR Binding

Given sets A and B, their binding A BIND B is:

A BIND B = { a xor b | a ∈ A, b ∈ B }

Every element of A is XORed with every element of B, producing |A| × |B| results.

Cartesian XOR Binding Operation Set A (k=2) a0 a1 Set B (k=2) b0 b1 BIND Cartesian XOR Result: A BIND B (k=4) a0 xor b0 a0 xor b1 a1 xor b0 a1 xor b1 2 × 2 = 4 elements (all pairs XORed)

Why Cartesian XOR?

The XOR Cancellation Property

XOR is its own inverse: x xor x = 0. This gives Cartesian XOR binding a crucial property:

(A BIND B) BIND B ≈ A

Binding with B, then binding again with B, approximately recovers A. This enables reversible encoding of structured information.

Example: XOR Cancellation in Action
A = {5, 9}
B = {3, 7}

A BIND B = {5 xor 3, 5 xor 7, 9 xor 3, 9 xor 7}
       = {6, 2, 10, 14}

(A BIND B) BIND B = {6, 2, 10, 14} BIND {3, 7}
              = {6 xor 3, 6 xor 7, 2 xor 3, 2 xor 7, ...}
              = {5, 1, 1, 5, 9, 13, 13, 9, ...}

After removing duplicates and sparsifying: {5, 9, ...}
The original A elements are recovered!

4. Sparsification: Keeping Sets Small

When binding two k-sets, the result has up to k2 elements. To maintain constant memory, we sparsify back to k elements.

Definition: Min-Hash Sparsification

Given a set S with |S| > k, select the k elements with the smallest hash values:

sparsify(S, k) = k elements of S with min(hash(e))

Intuition: Min-Hash sampling is like taking a "representative sample" of the set. Elements with the smallest hashes act as "fingerprints" that consistently identify the set, even when the full set is too large to store.
Before: 16 elements Min-Hash keep k=4 After: k=4 elements

Why Min-Hash Works

Min-Hash has a remarkable property: the probability that two sets share the same minimum hash equals their Jaccard similarity. This means sparsification preserves the essential similarity structure.

5. Similarity: Jaccard Index

Similarity between SPVectors is measured by how much their sets overlap.

Definition: Jaccard Similarity

sim(A, B) = |A ∩ B| / |A ∪ B|

The ratio of shared elements to total unique elements. Range: [0, 1].

A B A∩B Jaccard Index J(A,B) = |A∩B| / |A∪B| 0.0 = disjoint (no overlap) 0.5 = half elements shared 1.0 = identical sets
Example: Computing Jaccard Similarity
A = {1, 5, 9, 12}      (k=4)
B = {5, 9, 15, 20}     (k=4)

Intersection: A ∩ B = {5, 9}         |A∩B| = 2
Union:        A ∪ B = {1, 5, 9, 12, 15, 20}  |A∪B| = 6

Jaccard:      sim(A, B) = 2/6 = 0.333

6. Creating Concept Vectors

Each concept gets a deterministic SPVector generated from its name.

Algorithm: Deterministic Vector Creation
  1. Hash the concept name to get a seed: seed = DJB2("Dog")
  2. Initialize a PRNG with the seed
  3. Generate k unique random 64-bit integers using the PRNG

Read more about hash functions in HDC →

Name "Dog" DJB2 Seed 0x7b8c3d2e PRNG SPVector(Dog) = {k=4 exponents} 0x7a3f9c... 0x2c8e1f... 0x5d1a8c... 0x9f4b2d...

7. Encoding Relations

Structured knowledge is encoded by binding concepts with position vectors.

Relation Encoding Formula

fact(loves, John, Mary) = Loves BIND (Pos1 BIND John) BIND (Pos2 BIND Mary)

Position vectors (Pos1, Pos2, ...) are pre-defined constants that distinguish argument positions.

Loves BIND (Pos1 BIND John) BIND (Pos2 BIND Mary) = Fact Vector {k exponents encoding "loves(John,Mary)"} Why Position Vectors? Without positions: loves(John,Mary) = loves(Mary,John) (wrong!) With positions: different orderings produce different vectors

8. Querying: The Unbind Operation

To answer queries, we "unbind" the known parts to recover the unknown.

Query Principle

Since XOR-based binding cancels known parts, to find ?who in "loves(?who, Mary)":

candidate = KB BIND Loves BIND (Pos2 BIND Mary)

Then match candidate against vocabulary to find the answer.

Example: Query Resolution
# KB contains: loves(John, Mary) encoded as fact_vector

# Query: Who loves Mary?  (loves(?who, Mary))
partial_query = Loves BIND (Pos2 BIND Mary)

# Unbind from KB
candidate = KB BIND partial_query
         = fact_vector BIND Loves BIND (Pos2 BIND Mary)
         = (Pos1 BIND John) + noise

# Extract Pos1
raw_answer = candidate BIND Pos1 ≈ John + noise

# Match against vocabulary
best_match = argmax(sim(raw_answer, vocab[name]))
           = "John"

9. Mathematical Properties

Key Properties of SPHDC