Dense-Binary HDC represents concepts as fixed-length binary vectors where approximately half the bits are 1 and half are 0. This is the classic approach to Hyperdimensional Computing, providing robust distributed representations.
In Dense-Binary HDC, a concept is a long binary string. The key insight is that in high-dimensional spaces (thousands of bits), randomly generated binary vectors are almost orthogonal - they share approximately 50% of bits with any other random vector, which is the mathematical "neutral" point.
A binary hypervector V is a fixed-length sequence of bits:
V = [b0, b1, ..., bn-1] where bi ∈ {0, 1}
Typical dimensions: n = 2048 bits (256 bytes) or n = 32768 bits (4KB).
The power of high-dimensional binary spaces comes from a remarkable statistical property.
In a binary space of n dimensions, two randomly generated vectors share approximately n/2 bits:
Because random vectors are "equidistant" from each other (all at ~50% similarity), the space can accommodate an essentially unlimited number of distinct concepts. Each new random vector is guaranteed to be "far enough" from all existing vectors.
BIND creates associations between concepts. In Dense-Binary, BIND is implemented as bitwise XOR.
Given vectors A and B, their binding BIND(A, B) is computed bit-by-bit:
BIND(A, B)i = Ai XOR Bi
XOR produces 1 when bits differ, 0 when they match.
XOR is its own inverse, so Dense-Binary supports reversible binding: BIND(A, A) = 0 (all zeros).
BIND(BIND(A, B), B) = A
Binding with B, then binding again with B, perfectly recovers A. This is the mathematical foundation for querying knowledge bases.
Bundling combines multiple vectors into one by voting on each bit position.
Given vectors V1, V2, ..., Vk, their bundle is computed bit-by-bit:
bundle(V1...Vk)i = 1 if majority of Vj,i are 1, else 0
As more vectors are bundled, similarity to each original decreases.
| Vectors Bundled | Expected Similarity to Each | Quality |
|---|---|---|
| 3 | ~0.67 | Excellent |
| 10 | ~0.60 | Good |
| 50 | ~0.54 | Usable |
| 100 | ~0.52 | Marginal |
| 200+ | ~0.51 | Near noise |
Similarity measures how many bits two vectors share.
sim(A, B) = 1 - (popcount(A xor B) / n)
Where popcount counts the number of 1-bits (differing positions) and n is the dimension.
A = 10110010... (2048 bits) B = 01100110... (2048 bits) A xor B = 11010100... (bits that differ) popcount(A xor B) = 980 (number of differing bits) similarity = 1 - (980 / 2048) = 1 - 0.478 = 0.522 Interpretation: Nearly random (close to 0.5)
Each concept gets a deterministic binary vector generated from its name.
seed = DJB2("Dog")The same name always produces the same vector. This enables:
Structured knowledge is encoded by binding concepts with position vectors.
fact(loves, John, Mary) = Loves BIND (Pos1 BIND John) BIND (Pos2 BIND Mary)
Position vectors (Pos1, Pos2, ...) distinguish argument positions.
Since Dense-Binary implements BIND as XOR, queries can cancel known parts to find unknowns.
To find ?who in "loves(?who, Mary)", unbind the known parts:
candidate = KB BIND Loves BIND (Pos2 BIND Mary)
Then extract the answer by unbinding Pos1 and matching against vocabulary.
# KB contains: loves(John, Mary) encoded as fact_vector
# Query: Who loves Mary? (loves(?who, Mary))
partial = Loves BIND (Pos2 BIND Mary)
# Unbind from KB (XOR is reversible!)
candidate = KB BIND partial
= fact_vector BIND Loves BIND (Pos2 BIND Mary)
= (Pos1 BIND John)
# Extract answer
raw_answer = candidate BIND Pos1 = John
# Match against vocabulary
similarity(raw_answer, "John") ≈ 1.0 → Answer: John