This document provides a rigorous theoretical analysis of the Sparse Polynomial HDC (SPHDC) strategy. SPHDC represents a paradigm shift from classical dense Hyperdimensional Computing, utilizing small finite sets of high-entropy integers instead of large binary vectors.
SPHDC represents a fundamental departure from classical dense HDC. Instead of using 10,000-bit vectors, we utilize small finite sets of high-entropy integers. Each integer in the set (an exponent) acts as a coordinate in a hyper-sparse space.
Consider a vector with cardinality k = 4, where each element is a 64-bit integer (ei ∈ ℤ264):
C(264, 4) ≈ 2254
We are effectively compressing massive semantic entropy into a tiny memory footprint. This is the key insight that makes SPHDC viable: representational capacity is virtually unlimited, while memory usage remains minimal.
An SPVector V is defined as a finite set of k distinct exponents:
V = { e1, e2, ..., ek }
| Property | Specification |
|---|---|
| Data Type | Set<uint64_t> or SortedArray<uint64_t> for SIMD efficiency |
| Memory Footprint | k × 8 bytes |
| Standard Robust Profile (k = 64) | 512 bytes |
| Ultra-Light Profile (k = 4) | 32 bytes |
This format functions as a Holographic Signature: it captures the structural essence of information using anchor points in the GF(2) mathematical space.
To understand why SPHDC works, we must first understand GF(2) - the Galois Field with two elements. This is the algebraic structure that makes XOR-based binding reversible. Read the full GF(2) explanation →
GF(2) (also written as 𝔽₂ or ℤ/2ℤ) is the simplest possible field. It contains exactly two elements:
GF(2) = { 0, 1 }
With two operations defined:
| Addition in GF(2) | Multiplication in GF(2) | |||||
|---|---|---|---|---|---|---|
| + | 0 | 1 | × | 0 | 1 | |
| 0 | 0 | 1 | 0 | 0 | 0 | |
| 1 | 1 | 0 | 1 | 0 | 1 | |
1 + 1 = 0 (not 2, because 2 doesn't exist in GF(2)).
The most important property for SPHDC is that every element is its own additive inverse:
∀ a ∈ GF(2): a + a = 0
0 + 0 = 0 ✓
1 + 1 = 0 ✓
This means: to "subtract" in GF(2), you simply add again. There is no difference between + and −.
a + b = c implies c + b = a
1 + 1 = 0, therefore 0 + 1 = 1 ✓
A polynomial over GF(2) has coefficients that are either 0 or 1:
P(x) = x⁷ + x³ + x + 1
{0, 1, 3, 7}
The isomorphism: A sparse polynomial over GF(2) can be represented as the set of exponents where the coefficient is 1.
When we add two polynomials over GF(2), coefficients at each position are added mod 2:
P₁(x) = x⁵ + x³ + x → Set: {1, 3, 5}
P₂(x) = x³ + x² + 1 → Set: {0, 2, 3}
P₁ + P₂ = x⁵ + x² + x + 1
1 + 1 = 0 (cancels!)
{0, 1, 2, 5} = symmetric difference of {1,3,5} and {0,2,3}
A △ B = (A ∪ B) − (A ∩ B)
The GF(2) structure gives us reversible binding:
C = A xor B (XOR = GF(2) addition)
C xor B = A xor B xor B = A xor 0 = A
This is not magic - it's a direct consequence of the algebraic structure of GF(2) where every element is its own inverse.
A 64-bit integer can be viewed as a vector of 64 independent GF(2) elements:
a = 0x5A = 01011010₂
b = 0x3C = 00111100₂
a xor b = 01100110₂ = 0x66
XOR cancellation holds for the entire 64-bit word:
(a xor b) xor b = a
(0x5A xor 0x3C) xor 0x3C = 0x66 xor 0x3C = 0x5A ✓
SPHDC takes this one step further. Instead of operating on bit positions within a fixed-width integer, we operate on sets of exponents representing sparse polynomials:
x⁷ + x³ + 1 → bits at positions 0, 3, 7
{e₁, e₂, e₃, e₄} where each eᵢ is a 64-bit "exponent"
x^e₁ + x^e₂ + x^e₃ + x^e₄
The Cartesian XOR binding simulates polynomial multiplication in this sparse representation, preserving the algebraic properties of GF(2).
The system relies on the isomorphism between sets and sparse polynomials over GF(2), adjusted to maintain constant size.
To bind two concepts A and B, we combine every element of A with every element of B. This is equivalent to polynomial multiplication without summation, followed by a representative selection.
A BIND B = MinHashSelect({ a xor b | a ∈ A, b ∈ B }, k)
Mechanism:
a xor a = 0. This enables theoretical reversibility: (A BIND B) BIND B → A
To bundle information (e.g., "A and B"), we compute the union:
A + B = MinHashSelect(A ∪ B, k)
The bundled result maintains similarity to both inputs, enabling content-addressable retrieval from superposed knowledge.
This innovation allows maintaining fixed size k regardless of operation depth. From the expanded set of candidates, we keep only the k members with the lowest permuted hash value. Read the full Min-Hash explanation →
The Min-Hash filter provides:
In this system, "Holographic" implies that the information is distributed non-locally across the set elements.
Unlike a standard database record where losing a byte corrupts a specific field, SPHDC is resilient to set degradation.
Reconstruction: We can still successfully bind/unbind with a damaged vector, provided the Signal-to-Noise Ratio (SNR) remains above the discrimination threshold.
At low k (e.g., k = 4), the statistical variance of the Min-Hash estimator is high. A single "unlucky" sampling event during binding can sever the link to the original concept.
Adding too many items (A + B + C + D + ...) dilutes the Jaccard similarity of any single component.
To make SPHDC viable for high-performance production systems, we move beyond brute-force O(N) comparisons.
Since the space is sparse, we can map exponents back to the concepts containing them.
Map<uint64_t, List<ConceptID>>
Algorithm: Associative Memory Recall
Instead of calculating Jaccard against every concept in the vocabulary:
We can use variable resolution to balance storage vs. accuracy:
| Mode | k Value | Use Case |
|---|---|---|
| Storage (Cold) | k = 16 | Compress vectors for database storage |
| Processing (Hot) | k = 64 or k = 128 | Load for reasoning/binding with restored precision |
Implementation: Store multiple "seeds" or Min-Hash permutations for critical concepts to allow on-demand precision scaling.
For rapid rejection of non-matches, the set of exponents can be inserted into a small Bloom Filter.
Bloom(A) & Bloom(B)
This provides O(1) rejection of definitely-non-matching candidates, dramatically reducing average query time in large vocabularies.
| Capacity | Practically Infinite (2256+ states) |
| Robustness | Holographic. Tolerates element loss (k-degradation) |
| Retrieval | O(1) using Inverted Indexing heuristics |
| Binding | XOR cancellation via Cartesian XOR |
| Constraint | Requires k ≥ 64 for complex reasoning chains to overcome statistical variance |
SPHDC offers a mathematically rigorous path to Symbolic AI on Neural Substrates:
This hybrid nature allows SPHDC to:
| Aspect | Dense Binary | SPHDC |
|---|---|---|
| Representation | Fixed-length bit array (d bits) | Set of k integers |
| Memory (typical) | 256 bytes (d=2048) | 32 bytes (k=4) to 512 bytes (k=64) |
| Binding | XOR (exact cancellation) | Cartesian XOR (statistical cancellation) |
| Bundling | Majority vote | Set union + Min-Hash |
| Similarity | Hamming (deterministic) | Jaccard (statistical estimator) |
| State Space | 2d (22048) | C(264, k) (≈ 2254 for k=4) |
| Reversibility | Exact | Statistical (variance depends on k) |