SPHDC Theoretical Analysis

This document provides a rigorous theoretical analysis of the Sparse Polynomial HDC (SPHDC) strategy. SPHDC represents a paradigm shift from classical dense Hyperdimensional Computing, utilizing small finite sets of high-entropy integers instead of large binary vectors.

← Back to SPHDC Overview

Related Theory Pages:

GF(2) - Galois Field — the algebraic foundation for XOR operations and cancellation
Jaccard Similarity — the set-based similarity measure used for SPVector comparison
Min-Hash — locality-sensitive hashing for sparsification while preserving similarity
All Theoretical Concepts — complete index of mathematical foundations

1. Conceptual Foundation: Massive Entropy in Compact Form

SPHDC represents a fundamental departure from classical dense HDC. Instead of using 10,000-bit vectors, we utilize small finite sets of high-entropy integers. Each integer in the set (an exponent) acts as a coordinate in a hyper-sparse space.

1.1 The Entropy Argument (Capacity Analysis)

Consider a vector with cardinality k = 4, where each element is a 64-bit integer (e_i ∈ ℤ_2⁶⁴):

Value space for a single element: 1.84 × 10¹⁹

Unique combinations for a set of 4 elements:
C(2⁶⁴, 4) ≈ 2²⁵⁴

Conclusion: Even at minimal k, the state space (10⁷⁶) is astronomically larger than any conceivable vocabulary. The "Bounded Memory" constraint is strictly a computational efficiency choice, not a limit on representational capacity.

We are effectively compressing massive semantic entropy into a tiny memory footprint. This is the key insight that makes SPHDC viable: representational capacity is virtually unlimited, while memory usage remains minimal.

2. Data Structure: SPVector

An SPVector V is defined as a finite set of k distinct exponents:

V = { e₁, e₂, ..., e_k }

Property	Specification
Data Type	`Set<uint64_t>` or `SortedArray<uint64_t>` for SIMD efficiency
Memory Footprint	k × 8 bytes
Standard Robust Profile (k = 64)	512 bytes
Ultra-Light Profile (k = 4)	32 bytes

This format functions as a Holographic Signature: it captures the structural essence of information using anchor points in the GF(2) mathematical space.

3. Mathematical Foundation: GF(2)

To understand why SPHDC works, we must first understand GF(2) - the Galois Field with two elements. This is the algebraic structure that makes XOR-based binding reversible. Read the full GF(2) explanation →

3.1 What is GF(2)?

GF(2) (also written as 𝔽₂ or ℤ/2ℤ) is the simplest possible field. It contains exactly two elements:

GF(2) = { 0, 1 }

With two operations defined:

Addition in GF(2)			Multiplication in GF(2)
+	0	1	×	1
0	0	1	0	0
1	1	0	1	1

Key Observation: Addition in GF(2) is exactly the XOR operation! And crucially: 1 + 1 = 0 (not 2, because 2 doesn't exist in GF(2)).

3.2 Why GF(2) Addition Has XOR Cancellation

The most important property for SPHDC is that every element is its own additive inverse:

∀ a ∈ GF(2): a + a = 0

Proof:
0 + 0 = 0 ✓
1 + 1 = 0 ✓

This means: to "subtract" in GF(2), you simply add again. There is no difference between + and −.

a + b = c implies c + b = a

Example: 1 + 1 = 0, therefore 0 + 1 = 1 ✓

3.3 Polynomials over GF(2)

A polynomial over GF(2) has coefficients that are either 0 or 1:

P(x) = x⁷ + x³ + x + 1

This means: coefficients at positions 7, 3, 1, 0 are 1; all others are 0.
Equivalently: the set of exponents is {0, 1, 3, 7}

The isomorphism: A sparse polynomial over GF(2) can be represented as the set of exponents where the coefficient is 1.

3.4 Polynomial Addition = Set Symmetric Difference

When we add two polynomials over GF(2), coefficients at each position are added mod 2:

P₁(x) = x⁵ + x³ + x → Set: {1, 3, 5}
P₂(x) = x³ + x² + 1 → Set: {0, 2, 3}

P₁ + P₂ = x⁵ + x² + x + 1

Why? At position 3: 1 + 1 = 0 (cancels!)
Result set: {0, 1, 2, 5} = symmetric difference of {1,3,5} and {0,2,3}

Symmetric Difference: A △ B = (A ∪ B) − (A ∩ B)
Elements that appear in exactly one set, not both.

3.5 Why This Matters for SPHDC

The GF(2) structure gives us reversible binding:

If we encode: C = A xor B (XOR = GF(2) addition)

Then: C xor B = A xor B xor B = A xor 0 = A

We can recover A by XORing with B again!

This is not magic - it's a direct consequence of the algebraic structure of GF(2) where every element is its own inverse.

3.6 Extension to 64-bit Integers

A 64-bit integer can be viewed as a vector of 64 independent GF(2) elements:

a = 0x5A = 01011010₂
b = 0x3C = 00111100₂

a xor b = 01100110₂ = 0x66

Each bit position operates independently in GF(2).

XOR cancellation holds for the entire 64-bit word:

(a xor b) xor b = a
(0x5A xor 0x3C) xor 0x3C = 0x66 xor 0x3C = 0x5A ✓

3.7 From Bits to Sets: The SPHDC Connection

SPHDC takes this one step further. Instead of operating on bit positions within a fixed-width integer, we operate on sets of exponents representing sparse polynomials:

Traditional GF(2) polynomial: x⁷ + x³ + 1 → bits at positions 0, 3, 7

SPHDC: {e₁, e₂, e₃, e₄} where each eᵢ is a 64-bit "exponent"

The "polynomial" would be: x^e₁ + x^e₂ + x^e₃ + x^e₄
(astronomically sparse - only 4 terms out of 2⁶⁴ possible!)

The Cartesian XOR binding simulates polynomial multiplication in this sparse representation, preserving the algebraic properties of GF(2).

4. Algebraic Operations

The system relies on the isomorphism between sets and sparse polynomials over GF(2), adjusted to maintain constant size.

4.1 Binding: Cartesian XOR ( BIND )

To bind two concepts A and B, we combine every element of A with every element of B. This is equivalent to polynomial multiplication without summation, followed by a representative selection.

A BIND B = MinHashSelect({ a xor b | a ∈ A, b ∈ B }, k)

Mechanism:

Generate k × k intermediate terms via XOR
Since inputs are 64-bit, the result a xor b is valid within the same space
Apply Min-Hash sparsification to maintain cardinality k

Key Property: a xor a = 0. This enables theoretical reversibility: (A BIND B) BIND B → A

4.2 Superposition: Set Union (+)

To bundle information (e.g., "A and B"), we compute the union:

A + B = MinHashSelect(A ∪ B, k)

The bundled result maintains similarity to both inputs, enabling content-addressable retrieval from superposed knowledge.

4.3 Sparsification: The Min-Hash Filter

This innovation allows maintaining fixed size k regardless of operation depth. From the expanded set of candidates, we keep only the k members with the lowest permuted hash value. Read the full Min-Hash explanation →

Why it works: Min-Hash is an unbiased estimator of Jaccard similarity. If A and B are structurally similar, they will generate the same hash minima with probability equal to their similarity.

The Min-Hash filter provides:

Bounded Memory: Vector size never exceeds k × 8 bytes
Unbiased Sampling: Preserves similarity relationships statistically
Composition Safety: Deep operation chains remain tractable

5. Holographic Capabilities & Analysis

In this system, "Holographic" implies that the information is distributed non-locally across the set elements.

5.1 Distributed Resilience (The Holographic Property)

Unlike a standard database record where losing a byte corrupts a specific field, SPHDC is resilient to set degradation.

Scenario: Given a concept represented by k = 64 elements.

Damage: Randomly lose 50% of the elements (network packet loss, memory corruption).

Result: The remaining 32 elements still maintain:

Jaccard Similarity ≈ 1.0 with the original subset
Jaccard Similarity ≈ 0.5 with the full original set

Conclusion: The semantic identity remains intact, merely becoming "noisier".

Reconstruction: We can still successfully bind/unbind with a damaged vector, provided the Signal-to-Noise Ratio (SNR) remains above the discrimination threshold.

5.2 Limits of the Holographic Model

The Variance Trap (Low k Instability)

At low k (e.g., k = 4), the statistical variance of the Min-Hash estimator is high. A single "unlucky" sampling event during binding can sever the link to the original concept.

Mitigation: Use k ≥ 64 for structured queries requiring reliable reversibility.

Superposition Saturation

Adding too many items (A + B + C + D + ...) dilutes the Jaccard similarity of any single component.

Limit: Max capacity ≈ k / 4 items in superposition before retrieval becomes unreliable.

For k = 64: ~16 bundled items maximum
For k = 256: ~64 bundled items maximum

6. Heuristic Algorithms & Improvements

To make SPHDC viable for high-performance production systems, we move beyond brute-force O(N) comparisons.

6.1 The Inverted Index Heuristic (Fast Retrieval)

Since the space is sparse, we can map exponents back to the concepts containing them.

Structure: Map<uint64_t, List<ConceptID>>

Algorithm: Associative Memory Recall

Instead of calculating Jaccard against every concept in the vocabulary:

Input: Query Vector Q = {q₁, ..., q_k}
Vote: For each exponent q_i, look up concepts in the Inverted Index. Increment a counter for each concept found.
Select: The concept with the highest vote count is the winner.

Complexity: O(k) lookup time, independent of Vocabulary Size N. This makes the system extremely scalable.

6.2 Adaptive k Strategy

We can use variable resolution to balance storage vs. accuracy:

Mode	k Value	Use Case
Storage (Cold)	k = 16	Compress vectors for database storage
Processing (Hot)	k = 64 or k = 128	Load for reasoning/binding with restored precision

Implementation: Store multiple "seeds" or Min-Hash permutations for critical concepts to allow on-demand precision scaling.

6.3 Bloom Filter Optimization

For rapid rejection of non-matches, the set of exponents can be inserted into a small Bloom Filter.

Check: Bloom(A) & Bloom(B)

If the bitwise AND count is low, skip the expensive Jaccard calculation.

This provides O(1) rejection of definitely-non-matching candidates, dramatically reducing average query time in large vocabularies.

7. Mathematical Properties Summary

Capacity	Practically Infinite (2²⁵⁶+ states)
Robustness	Holographic. Tolerates element loss (k-degradation)
Retrieval	O(1) using Inverted Indexing heuristics
Binding	XOR cancellation via Cartesian XOR
Constraint	Requires k ≥ 64 for complex reasoning chains to overcome statistical variance

8. The Bridge: Symbolic AI on Neural Substrates

SPHDC offers a mathematically rigorous path to Symbolic AI on Neural Substrates:

It manipulates discrete symbols (integers)
Using continuous statistical operations (sets/Jaccard)
Bridging the gap between precision and flexibility

This hybrid nature allows SPHDC to:

Represent symbolic knowledge with mathematical precision
Compute using operations amenable to hardware acceleration
Degrade gracefully under noise, unlike brittle symbolic systems
Scale to massive vocabularies without proportional memory growth

9. Comparison with Dense Binary HDC

Aspect	Dense Binary	SPHDC
Representation	Fixed-length bit array (d bits)	Set of k integers
Memory (typical)	256 bytes (d=2048)	32 bytes (k=4) to 512 bytes (k=64)
Binding	XOR (exact cancellation)	Cartesian XOR (statistical cancellation)
Bundling	Majority vote	Set union + Min-Hash
Similarity	Hamming (deterministic)	Jaccard (statistical estimator)
State Space	2^d (2²⁰⁴⁸)	C(2⁶⁴, k) (≈ 2²⁵⁴ for k=4)
Reversibility	Exact	Statistical (variance depends on k)

10. Implementation Considerations

10.1 Choosing k

k = 4: Ultra-light, suitable for simple fact storage and shallow queries
k = 16: Balanced, good for moderate reasoning depth
k = 64: Robust, recommended for complex inference chains
k = 256: High-fidelity, for precision-critical applications

10.2 When to Use SPHDC

Memory-constrained environments where dense binary is too large
Large vocabularies where inverted indexing provides O(1) retrieval
Streaming applications where bounded memory is essential
Distributed systems where small vector size reduces network overhead

10.3 When to Prefer Dense Binary

Deep reasoning chains requiring exact reversibility
Deterministic applications where statistical variance is unacceptable
Hardware with efficient bit operations (SIMD, GPU)