AGISystem2 uses hash functions to generate deterministic concept vectors from string names. This ensures the same concept name always produces the same vector, enabling reproducibility and distributed computation.

Why Hash Functions?

In HDC, each concept needs a unique high-dimensional representation. Rather than storing explicit mappings (concept → vector), we derive vectors from names using hash functions:

The DJB2 Hash Function

AGISystem2 uses DJB2 (Daniel J. Bernstein hash 2) as the primary hash function for generating seeds. DJB2 is chosen for its simplicity, speed, and good distribution properties.

Algorithm

function djb2(str) {
    let hash = 5381;
    for (let i = 0; i < str.length; i++) {
        hash = ((hash << 5) + hash) + str.charCodeAt(i);
        // Equivalent to: hash = hash * 33 + char
    }
    return hash >>> 0;  // Convert to unsigned 32-bit
}

Properties

Property Value
Output size 32 bits (unsigned integer)
Initial value 5381 (magic constant)
Multiplier 33 (via shift-and-add)
Collision resistance Good for typical concept names
Speed Very fast (simple arithmetic)

From Hash to Vector

The hash value serves as a seed for a pseudo-random number generator (PRNG), which then generates the full vector:

// Concept name → Hash seed → PRNG → Vector

"Dog" → DJB2("Dog") = 0x7b8c3d2e
      → PRNG(seed=0x7b8c3d2e)
      → [random bits or exponents]

For Dense-Binary HDC

The PRNG generates n random bits (typically 2048), with approximately 50% ones and 50% zeros:

seed = DJB2("Dog")
prng = new PRNG(seed)
vector = new Uint32Array(64)  // 64 * 32 = 2048 bits
for (i = 0; i < 64; i++) {
    vector[i] = prng.next()
}

For SPHDC

The PRNG generates k unique 64-bit integers (exponents):

seed = DJB2("Dog")
prng = new PRNG(seed)
exponents = new Set()
while (exponents.size < k) {
    exponents.add(prng.next64())  // 64-bit value
}
vector = Array.from(exponents)

The PRNG: Mulberry32

AGISystem2 uses Mulberry32, a fast 32-bit PRNG with good statistical properties:

function mulberry32(seed) {
    return function() {
        let t = seed += 0x6D2B79F5;
        t = Math.imul(t ^ t >>> 15, t | 1);
        t ^= t + Math.imul(t ^ t >>> 7, t | 61);
        return ((t ^ t >>> 14) >>> 0);
    };
}

For 64-bit values (SPHDC), two 32-bit outputs are combined:

function next64(prng) {
    const high = prng();
    const low = prng();
    return BigInt(high) << 32n | BigInt(low);
}

Why Not Cryptographic Hashes?

Cryptographic hash functions (SHA-256, etc.) are not used because:

However, for applications requiring privacy-preserving properties, cryptographic hashes could be used. See Privacy-Preserving HDC for analysis.

Collision Handling

Hash collisions (different names producing the same hash) are extremely rare with 32-bit hashes for typical vocabulary sizes:

Vocabulary Size Collision Probability
1,000 concepts < 0.01%
10,000 concepts < 1%
100,000 concepts ~11% (birthday paradox)

For very large vocabularies, the system could use 64-bit hashes or content-addressable naming schemes.

Related Pages