This document provides a rigorous theoretical analysis of the Sparse Polynomial HDC (SPHDC) strategy. SPHDC represents a paradigm shift from classical dense Hyperdimensional Computing, utilizing small finite sets of high-entropy integers instead of large binary vectors.

← Back to SPHDC Overview

Related Theory Pages:

1. Conceptual Foundation: Massive Entropy in Compact Form

SPHDC represents a fundamental departure from classical dense HDC. Instead of using 10,000-bit vectors, we utilize small finite sets of high-entropy integers. Each integer in the set (an exponent) acts as a coordinate in a hyper-sparse space.

1.1 The Entropy Argument (Capacity Analysis)

Consider a vector with cardinality k = 4, where each element is a 64-bit integer (ei ∈ ℤ264):

Value space for a single element: 1.84 × 1019

Unique combinations for a set of 4 elements:
C(264, 4) ≈ 2254
Conclusion: Even at minimal k, the state space (1076) is astronomically larger than any conceivable vocabulary. The "Bounded Memory" constraint is strictly a computational efficiency choice, not a limit on representational capacity.

We are effectively compressing massive semantic entropy into a tiny memory footprint. This is the key insight that makes SPHDC viable: representational capacity is virtually unlimited, while memory usage remains minimal.

2. Data Structure: SPVector

An SPVector V is defined as a finite set of k distinct exponents:

V = { e1, e2, ..., ek }
Property Specification
Data Type Set<uint64_t> or SortedArray<uint64_t> for SIMD efficiency
Memory Footprint k × 8 bytes
Standard Robust Profile (k = 64) 512 bytes
Ultra-Light Profile (k = 4) 32 bytes

This format functions as a Holographic Signature: it captures the structural essence of information using anchor points in the GF(2) mathematical space.

3. Mathematical Foundation: GF(2)

To understand why SPHDC works, we must first understand GF(2) - the Galois Field with two elements. This is the algebraic structure that makes XOR-based binding reversible. Read the full GF(2) explanation →

3.1 What is GF(2)?

GF(2) (also written as 𝔽₂ or ℤ/2ℤ) is the simplest possible field. It contains exactly two elements:

GF(2) = { 0, 1 }

With two operations defined:

Addition in GF(2) Multiplication in GF(2)
+01 ×01
001 000
110 101
Key Observation: Addition in GF(2) is exactly the XOR operation! And crucially: 1 + 1 = 0 (not 2, because 2 doesn't exist in GF(2)).

3.2 Why GF(2) Addition Has XOR Cancellation

The most important property for SPHDC is that every element is its own additive inverse:

∀ a ∈ GF(2): a + a = 0

Proof:
0 + 0 = 0 ✓
1 + 1 = 0 ✓

This means: to "subtract" in GF(2), you simply add again. There is no difference between + and −.

a + b = c implies c + b = a

Example: 1 + 1 = 0, therefore 0 + 1 = 1

3.3 Polynomials over GF(2)

A polynomial over GF(2) has coefficients that are either 0 or 1:

P(x) = x⁷ + x³ + x + 1

This means: coefficients at positions 7, 3, 1, 0 are 1; all others are 0.
Equivalently: the set of exponents is {0, 1, 3, 7}

The isomorphism: A sparse polynomial over GF(2) can be represented as the set of exponents where the coefficient is 1.

3.4 Polynomial Addition = Set Symmetric Difference

When we add two polynomials over GF(2), coefficients at each position are added mod 2:

P₁(x) = x⁵ + x³ + x → Set: {1, 3, 5}
P₂(x) = x³ + x² + 1 → Set: {0, 2, 3}

P₁ + P₂ = x⁵ + x² + x + 1

Why? At position 3: 1 + 1 = 0 (cancels!)
Result set: {0, 1, 2, 5} = symmetric difference of {1,3,5} and {0,2,3}
Symmetric Difference: A △ B = (A ∪ B) − (A ∩ B)
Elements that appear in exactly one set, not both.

3.5 Why This Matters for SPHDC

The GF(2) structure gives us reversible binding:

If we encode: C = A xor B (XOR = GF(2) addition)

Then: C xor B = A xor B xor B = A xor 0 = A

We can recover A by XORing with B again!

This is not magic - it's a direct consequence of the algebraic structure of GF(2) where every element is its own inverse.

3.6 Extension to 64-bit Integers

A 64-bit integer can be viewed as a vector of 64 independent GF(2) elements:

a = 0x5A = 01011010₂
b = 0x3C = 00111100₂

a xor b = 01100110₂ = 0x66

Each bit position operates independently in GF(2).

XOR cancellation holds for the entire 64-bit word:

(a xor b) xor b = a
(0x5A xor 0x3C) xor 0x3C = 0x66 xor 0x3C = 0x5A

3.7 From Bits to Sets: The SPHDC Connection

SPHDC takes this one step further. Instead of operating on bit positions within a fixed-width integer, we operate on sets of exponents representing sparse polynomials:

Traditional GF(2) polynomial: x⁷ + x³ + 1 → bits at positions 0, 3, 7

SPHDC: {e₁, e₂, e₃, e₄} where each eᵢ is a 64-bit "exponent"

The "polynomial" would be: x^e₁ + x^e₂ + x^e₃ + x^e₄
(astronomically sparse - only 4 terms out of 2⁶⁴ possible!)

The Cartesian XOR binding simulates polynomial multiplication in this sparse representation, preserving the algebraic properties of GF(2).

4. Algebraic Operations

The system relies on the isomorphism between sets and sparse polynomials over GF(2), adjusted to maintain constant size.

4.1 Binding: Cartesian XOR ( BIND )

To bind two concepts A and B, we combine every element of A with every element of B. This is equivalent to polynomial multiplication without summation, followed by a representative selection.

A BIND B = MinHashSelect({ a xor b | a ∈ A, b ∈ B }, k)

Mechanism:

  1. Generate k × k intermediate terms via XOR
  2. Since inputs are 64-bit, the result a xor b is valid within the same space
  3. Apply Min-Hash sparsification to maintain cardinality k
Key Property: a xor a = 0. This enables theoretical reversibility: (A BIND B) BIND B → A

4.2 Superposition: Set Union (+)

To bundle information (e.g., "A and B"), we compute the union:

A + B = MinHashSelect(A ∪ B, k)

The bundled result maintains similarity to both inputs, enabling content-addressable retrieval from superposed knowledge.

4.3 Sparsification: The Min-Hash Filter

This innovation allows maintaining fixed size k regardless of operation depth. From the expanded set of candidates, we keep only the k members with the lowest permuted hash value. Read the full Min-Hash explanation →

Why it works: Min-Hash is an unbiased estimator of Jaccard similarity. If A and B are structurally similar, they will generate the same hash minima with probability equal to their similarity.

The Min-Hash filter provides:

5. Holographic Capabilities & Analysis

In this system, "Holographic" implies that the information is distributed non-locally across the set elements.

5.1 Distributed Resilience (The Holographic Property)

Unlike a standard database record where losing a byte corrupts a specific field, SPHDC is resilient to set degradation.

Scenario: Given a concept represented by k = 64 elements.

Damage: Randomly lose 50% of the elements (network packet loss, memory corruption).

Result: The remaining 32 elements still maintain: Conclusion: The semantic identity remains intact, merely becoming "noisier".

Reconstruction: We can still successfully bind/unbind with a damaged vector, provided the Signal-to-Noise Ratio (SNR) remains above the discrimination threshold.

5.2 Limits of the Holographic Model

The Variance Trap (Low k Instability)

At low k (e.g., k = 4), the statistical variance of the Min-Hash estimator is high. A single "unlucky" sampling event during binding can sever the link to the original concept.

Mitigation: Use k ≥ 64 for structured queries requiring reliable reversibility.

Superposition Saturation

Adding too many items (A + B + C + D + ...) dilutes the Jaccard similarity of any single component.

Limit: Max capacity ≈ k / 4 items in superposition before retrieval becomes unreliable.

For k = 64: ~16 bundled items maximum
For k = 256: ~64 bundled items maximum

6. Heuristic Algorithms & Improvements

To make SPHDC viable for high-performance production systems, we move beyond brute-force O(N) comparisons.

6.1 The Inverted Index Heuristic (Fast Retrieval)

Since the space is sparse, we can map exponents back to the concepts containing them.

Structure: Map<uint64_t, List<ConceptID>>

Algorithm: Associative Memory Recall

Instead of calculating Jaccard against every concept in the vocabulary:

  1. Input: Query Vector Q = {q1, ..., qk}
  2. Vote: For each exponent qi, look up concepts in the Inverted Index. Increment a counter for each concept found.
  3. Select: The concept with the highest vote count is the winner.
Complexity: O(k) lookup time, independent of Vocabulary Size N. This makes the system extremely scalable.

6.2 Adaptive k Strategy

We can use variable resolution to balance storage vs. accuracy:

Mode k Value Use Case
Storage (Cold) k = 16 Compress vectors for database storage
Processing (Hot) k = 64 or k = 128 Load for reasoning/binding with restored precision

Implementation: Store multiple "seeds" or Min-Hash permutations for critical concepts to allow on-demand precision scaling.

6.3 Bloom Filter Optimization

For rapid rejection of non-matches, the set of exponents can be inserted into a small Bloom Filter.

Check: Bloom(A) & Bloom(B)

If the bitwise AND count is low, skip the expensive Jaccard calculation.

This provides O(1) rejection of definitely-non-matching candidates, dramatically reducing average query time in large vocabularies.

7. Mathematical Properties Summary

Capacity Practically Infinite (2256+ states)
Robustness Holographic. Tolerates element loss (k-degradation)
Retrieval O(1) using Inverted Indexing heuristics
Binding XOR cancellation via Cartesian XOR
Constraint Requires k ≥ 64 for complex reasoning chains to overcome statistical variance

8. The Bridge: Symbolic AI on Neural Substrates

SPHDC offers a mathematically rigorous path to Symbolic AI on Neural Substrates:

This hybrid nature allows SPHDC to:

  1. Represent symbolic knowledge with mathematical precision
  2. Compute using operations amenable to hardware acceleration
  3. Degrade gracefully under noise, unlike brittle symbolic systems
  4. Scale to massive vocabularies without proportional memory growth

9. Comparison with Dense Binary HDC

Aspect Dense Binary SPHDC
Representation Fixed-length bit array (d bits) Set of k integers
Memory (typical) 256 bytes (d=2048) 32 bytes (k=4) to 512 bytes (k=64)
Binding XOR (exact cancellation) Cartesian XOR (statistical cancellation)
Bundling Majority vote Set union + Min-Hash
Similarity Hamming (deterministic) Jaccard (statistical estimator)
State Space 2d (22048) C(264, k) (≈ 2254 for k=4)
Reversibility Exact Statistical (variance depends on k)

10. Implementation Considerations

10.1 Choosing k

k = 4: Ultra-light, suitable for simple fact storage and shallow queries
k = 16: Balanced, good for moderate reasoning depth
k = 64: Robust, recommended for complex inference chains
k = 256: High-fidelity, for precision-critical applications

10.2 When to Use SPHDC

10.3 When to Prefer Dense Binary

Further Reading