Dense-Binary HDC - Theory

Dense-Binary HDC represents concepts as fixed-length binary vectors where approximately half the bits are 1 and half are 0. This is the classic approach to Hyperdimensional Computing, providing robust distributed representations.

1. The Core Idea: Binary Patterns as Representations

Fundamental Insight

In Dense-Binary HDC, a concept is a long binary string. The key insight is that in high-dimensional spaces (thousands of bits), randomly generated binary vectors are almost orthogonal - they share approximately 50% of bits with any other random vector, which is the mathematical "neutral" point.

Definition: Binary Hypervector

A binary hypervector V is a fixed-length sequence of bits:

V = [b₀, b₁, ..., b_n-1] where b_i ∈ {0, 1}

Typical dimensions: n = 2048 bits (256 bytes) or n = 32768 bits (4KB).

Why Binary?

Intuition: Binary vectors are the simplest possible representation. Each bit is a yes/no answer to an abstract question about the concept. With thousands of such questions, concepts become uniquely identifiable patterns.

2. The Quasi-Orthogonality Principle

The power of high-dimensional binary spaces comes from a remarkable statistical property.

Theorem: Quasi-Orthogonality

In a binary space of n dimensions, two randomly generated vectors share approximately n/2 bits:

Expected match: 50%
Standard deviation: O(1/√n)
For n=2048: 99.9% of random pairs have 48-52% match

Why This Matters

Because random vectors are "equidistant" from each other (all at ~50% similarity), the space can accommodate an essentially unlimited number of distinct concepts. Each new random vector is guaranteed to be "far enough" from all existing vectors.

3. Binding: BIND (XOR Operation)

BIND creates associations between concepts. In Dense-Binary, BIND is implemented as bitwise XOR.

Definition: BIND (XOR)

Given vectors A and B, their binding BIND(A, B) is computed bit-by-bit:

BIND(A, B)_i = A_i XOR B_i

XOR produces 1 when bits differ, 0 when they match.

Why XOR?

The XOR Cancellation Property

XOR is its own inverse, so Dense-Binary supports reversible binding: BIND(A, A) = 0 (all zeros).

BIND(BIND(A, B), B) = A

Binding with B, then binding again with B, perfectly recovers A. This is the mathematical foundation for querying knowledge bases.

4. Bundling: Majority Vote

Bundling combines multiple vectors into one by voting on each bit position.

Definition: Majority Vote Bundling

Given vectors V₁, V₂, ..., V_k, their bundle is computed bit-by-bit:

bundle(V₁...V_k)_i = 1 if majority of V_j,i are 1, else 0

Intuition: The bundle is a "consensus" vector. It preserves features that appear in most inputs while filtering out idiosyncratic features. The result is similar to ALL inputs - this is content-addressable memory.

Bundle Capacity

As more vectors are bundled, similarity to each original decreases.

Vectors Bundled	Expected Similarity to Each	Quality
3	~0.67	Excellent
10	~0.60	Good
50	~0.54	Usable
100	~0.52	Marginal
200+	~0.51	Near noise

5. Similarity: Hamming Distance

Similarity measures how many bits two vectors share.

Definition: Normalized Hamming Similarity

sim(A, B) = 1 - (popcount(A xor B) / n)

Where popcount counts the number of 1-bits (differing positions) and n is the dimension.

Example: Computing Similarity

A = 10110010... (2048 bits)
B = 01100110... (2048 bits)

A xor B = 11010100...  (bits that differ)
popcount(A xor B) = 980  (number of differing bits)

similarity = 1 - (980 / 2048) = 1 - 0.478 = 0.522

Interpretation: Nearly random (close to 0.5)

6. Creating Concept Vectors

Each concept gets a deterministic binary vector generated from its name.

Algorithm: Deterministic Vector Creation

Hash the concept name to get a seed: seed = DJB2("Dog")
Initialize a PRNG with the seed
Generate n random bits (50% ones, 50% zeros)

Read more about hash functions in HDC →

Why Deterministic?

The same name always produces the same vector. This enables:

Reproducible computations across sessions
No need to store explicit concept-vector mappings
Distributed systems with shared vocabulary

7. Encoding Relations

Structured knowledge is encoded by binding concepts with position vectors.

Relation Encoding Formula

fact(loves, John, Mary) = Loves BIND (Pos1 BIND John) BIND (Pos2 BIND Mary)

Position vectors (Pos1, Pos2, ...) distinguish argument positions.

8. Querying: The Unbind Operation

Since Dense-Binary implements BIND as XOR, queries can cancel known parts to find unknowns.

Query Principle

To find ?who in "loves(?who, Mary)", unbind the known parts:

candidate = KB BIND Loves BIND (Pos2 BIND Mary)

Then extract the answer by unbinding Pos1 and matching against vocabulary.

Example: Query Resolution

# KB contains: loves(John, Mary) encoded as fact_vector

# Query: Who loves Mary?  (loves(?who, Mary))
partial = Loves BIND (Pos2 BIND Mary)

# Unbind from KB (XOR is reversible!)
candidate = KB BIND partial
         = fact_vector BIND Loves BIND (Pos2 BIND Mary)
         = (Pos1 BIND John)

# Extract answer
raw_answer = candidate BIND Pos1 = John

# Match against vocabulary
similarity(raw_answer, "John") ≈ 1.0  → Answer: John

9. Mathematical Properties

Key Properties of Dense-Binary HDC

Perfect XOR cancellation: (A xor B) xor B = A exactly
Commutativity: A xor B = B xor A
Associativity: (A xor B) xor C = A xor (B xor C)
Determinism: Same inputs always produce same outputs
Fixed Memory: All vectors have exactly n bits
Graceful Degradation: Bundle capacity decreases smoothly