AGISystem2 uses hash functions to generate deterministic concept vectors from string names. This ensures the same concept name always produces the same vector, enabling reproducibility and distributed computation.
In HDC, each concept needs a unique high-dimensional representation. Rather than storing explicit mappings (concept → vector), we derive vectors from names using hash functions:
AGISystem2 uses DJB2 (Daniel J. Bernstein hash 2) as the primary hash function for generating seeds. DJB2 is chosen for its simplicity, speed, and good distribution properties.
function djb2(str) {
let hash = 5381;
for (let i = 0; i < str.length; i++) {
hash = ((hash << 5) + hash) + str.charCodeAt(i);
// Equivalent to: hash = hash * 33 + char
}
return hash >>> 0; // Convert to unsigned 32-bit
}
| Property | Value |
|---|---|
| Output size | 32 bits (unsigned integer) |
| Initial value | 5381 (magic constant) |
| Multiplier | 33 (via shift-and-add) |
| Collision resistance | Good for typical concept names |
| Speed | Very fast (simple arithmetic) |
The hash value serves as a seed for a pseudo-random number generator (PRNG), which then generates the full vector:
// Concept name → Hash seed → PRNG → Vector
"Dog" → DJB2("Dog") = 0x7b8c3d2e
→ PRNG(seed=0x7b8c3d2e)
→ [random bits or exponents]
The PRNG generates n random bits (typically 2048), with approximately 50% ones and 50% zeros:
seed = DJB2("Dog")
prng = new PRNG(seed)
vector = new Uint32Array(64) // 64 * 32 = 2048 bits
for (i = 0; i < 64; i++) {
vector[i] = prng.next()
}
The PRNG generates k unique 64-bit integers (exponents):
seed = DJB2("Dog")
prng = new PRNG(seed)
exponents = new Set()
while (exponents.size < k) {
exponents.add(prng.next64()) // 64-bit value
}
vector = Array.from(exponents)
AGISystem2 uses Mulberry32, a fast 32-bit PRNG with good statistical properties:
function mulberry32(seed) {
return function() {
let t = seed += 0x6D2B79F5;
t = Math.imul(t ^ t >>> 15, t | 1);
t ^= t + Math.imul(t ^ t >>> 7, t | 61);
return ((t ^ t >>> 14) >>> 0);
};
}
For 64-bit values (SPHDC), two 32-bit outputs are combined:
function next64(prng) {
const high = prng();
const low = prng();
return BigInt(high) << 32n | BigInt(low);
}
Cryptographic hash functions (SHA-256, etc.) are not used because:
However, for applications requiring privacy-preserving properties, cryptographic hashes could be used. See Privacy-Preserving HDC for analysis.
Hash collisions (different names producing the same hash) are extremely rare with 32-bit hashes for typical vocabulary sizes:
| Vocabulary Size | Collision Probability |
|---|---|
| 1,000 concepts | < 0.01% |
| 10,000 concepts | < 1% |
| 100,000 concepts | ~11% (birthday paradox) |
For very large vocabularies, the system could use 64-bit hashes or content-addressable naming schemes.