This document analyzes the potential for implementing privacy-preserving computation using HDC primitives. We explore the parallels with homomorphic encryption, the possibility of cloud-based reasoning on encrypted knowledge, information leakage risks, and applications in federated learning.

1. The Vision: Compute Without Revealing

The fundamental question: Can we perform reasoning on knowledge without the reasoner knowing what the knowledge represents?

Homomorphic Encryption Analogy:

In homomorphic encryption, we can compute f(E(x)) = E(f(x)) - compute on encrypted data and get encrypted results that decrypt to the correct answer.

In HDC, can we achieve: reason(Encode(knowledge)) → Encode(conclusions)?

1.1 The HDC Privacy Hypothesis

HDC has properties that suggest privacy-preserving potential:

  1. Distributed Representation: Information is spread across all dimensions
  2. Deterministic Operations: Binding and bundling are mathematical, not learned
  3. Reversibility: (A BIND B) BIND B = A works without knowing what A or B represent
  4. Composition: Complex structures built from atomic vectors

The key insight: If the atomic concept vectors are secret keys, can the composite structures be safely processed by untrusted parties?

2. Architecture: Secret Atoms, Public Reasoning

2.1 The Basic Scheme

Setup Phase (Client - Private):
1. Generate random seed S (the master secret)
2. For each concept name, derive vector: Vconcept = PRNG(Hash(S || "concept"))
3. Keep S and the name→vector mapping private

Encoding Phase (Client - Private):
4. Encode facts using secret atom vectors
5. fact_vec = Relation BIND (Pos1 BIND Arg1) BIND (Pos2 BIND Arg2)
6. Bundle all facts: KB = bundle([fact1, fact2, ...])

Reasoning Phase (Cloud - Public):
7. Send KB (bundled vector) to cloud
8. Cloud performs reasoning operations (prove, query)
9. Cloud returns result vectors

Decoding Phase (Client - Private):
10. Client decodes result vectors using secret atom vectors
11. Match results against known concept vectors

2.2 Why This Might Work

The cloud sees only high-dimensional vectors. Without knowing the atom vectors:

Key Property: HDC operations are structure-preserving regardless of what the vectors represent. The cloud computes correctly without understanding the semantics.

3. Information Leakage Analysis

Unfortunately, HDC representations leak information in several ways. This section analyzes what an adversary can learn.

3.1 Structural Information Leakage

Attack 1: Knowledge Base Size

The bundled KB vector doesn't hide how many facts were bundled. An adversary can estimate the number of facts from the vector's statistical properties.

Attack 2: Repeated Concepts

If the same concept appears in multiple facts, the KB will have higher similarity to that concept's vector. By probing with random vectors, an adversary can detect "hot spots" - frequently occurring patterns.

Attack 3: Query Correlation

If the client sends multiple queries, the adversary can correlate them. Queries about related concepts will have detectable similarity patterns.

3.2 Similarity-Based Attacks

The fundamental problem: similarity is observable.

Given two encoded facts F1 and F2:
sim(F1, F2) reveals structural relationship

If F1 = loves(John, Mary) and F2 = loves(John, Alice):
sim(F1, F2) > random baseline
→ Adversary learns: F1 and F2 share structure (same relation, overlapping arguments)

This is analogous to the problem of order-preserving encryption leaking ordering information.

3.3 Dictionary Attacks

If the adversary knows the domain vocabulary (e.g., all possible relation names):

For each candidate concept C in dictionary:
1. Generate probe vector VC (using same PRNG algorithm)
2. Compute sim(KB, VC)
3. High similarity → C is likely in the KB
Mitigation: This attack requires knowing the master seed S. If S is truly secret, the adversary cannot generate correct probe vectors. However, if the adversary can observe enough query-response pairs, they may be able to learn the mapping statistically.

3.4 What Cannot Be Hidden

Information Type Hidden? Notes
Exact concept names Yes (if seed secret) For hash/PRNG strategies, names can be hashed and not stored; EXACT uses a session-local dictionary for atom IDs
Number of facts Partially Estimable from vector statistics
Structural patterns No Similarity reveals shared structure
Query patterns No Repeated/similar queries are detectable
Reasoning complexity No Proof depth visible from computation
Knowledge graph topology Partially Connectivity patterns may leak

4. Comparison with Homomorphic Encryption

4.1 Fully Homomorphic Encryption (FHE)

True FHE provides:

HDC provides none of these in a cryptographic sense.

4.2 What HDC Offers Instead

Property FHE HDC "Privacy"
Security model Cryptographic (provable) Obfuscation (heuristic)
Information leakage None (semantic security) Structural patterns leak
Computation speed Very slow (10000x+ overhead) Near real-time
Supported operations Any (with circuit conversion) HDC primitives only
Key management Complex (bootstrapping) Simple (single seed)
Practical today? Limited applications Yes, if leakage acceptable

4.3 The Honest Assessment

HDC is NOT homomorphic encryption.

It provides practical obfuscation with known leakage, not cryptographic security with provable guarantees. Use cases must accept this limitation.

5. Partial Homomorphic Properties

While not fully homomorphic, HDC does exhibit some homomorphic-like properties:

5.1 Additive Homomorphism (Bundling)

E(A) + E(B) = E(A + B)

bundle(encode(fact1), encode(fact2)) = encode(KB containing fact1 and fact2)

Bundling combines encoded facts without decoding them.

5.2 Multiplicative Homomorphism (Binding)

E(A) BIND E(B) = E(A BIND B)

The cloud can bind vectors without knowing what they represent.
This enables constructing new composite concepts from existing ones.

5.3 Query Homomorphism

Given: KB = encode(facts), Q = encode(query)
Result = KB BIND Q-1 (unbind query from KB)

The cloud computes Result without knowing:
- What facts are in KB
- What the query asks
- What the result means

Client decodes Result using secret atom vectors.

6. Federated Learning Applications

HDC's properties make it interesting for federated scenarios where multiple parties contribute knowledge without revealing it.

6.1 Federated Knowledge Aggregation

Scenario: Multiple hospitals want to combine medical knowledge without sharing patient data.

Protocol:
1. All parties agree on master seed S (shared secret)
2. Each party encodes their local knowledge: KBi = encode(local_factsi)
3. A coordinator bundles: KBglobal = bundle(KB1, KB2, ..., KBn)
4. Any party can query KBglobal using shared encoding

Privacy Properties

6.2 Private Query Answering

Scenario: User wants to query a cloud knowledge base without revealing what they're asking.

Protocol:
1. User encodes query with private atom vectors: Q = encode(query)
2. Cloud computes: candidates = KB BIND Q
3. Cloud returns top-k similar vectors from candidates
4. User decodes results locally

Analysis

6.3 Differential Privacy Integration

HDC's bundling operation is amenable to differential privacy:

KBnoisy = bundle(KB, noise_vector1, noise_vector2, ...)

Adding random "noise facts" provides plausible deniability:
- The presence of any specific fact cannot be proven
- Query accuracy degrades gracefully with noise level
- Standard differential privacy guarantees may apply

7. Implementation Architecture

7.1 Client-Side (Trusted)

class PrivateHDCClient {
  constructor(masterSeed) {
    this.seed = masterSeed;  // Keep secret!
    this.atomCache = new Map();
  }

  // Generate deterministic but secret atom vector
  getAtom(conceptName) {
    if (!this.atomCache.has(conceptName)) {
      const hash = SHA256(this.seed + conceptName);
      const vec = PRNG_Vector(hash, this.geometry);
      this.atomCache.set(conceptName, vec);
    }
    return this.atomCache.get(conceptName);
  }

  // Encode a fact using secret atoms
  encodeFact(relation, ...args) {
    let vec = this.getAtom(relation);
    for (let i = 0; i < args.length; i++) {
      const posVec = this.getAtom(`Pos${i+1}`);
      const argVec = this.getAtom(args[i]);
      vec = bind(vec, bind(posVec, argVec));
    }
    return vec;  // Safe to send to cloud
  }

  // Decode result by matching against known atoms
  decodeResult(resultVec, candidates) {
    return candidates
      .map(c => ({ name: c, sim: similarity(resultVec, this.getAtom(c)) }))
      .sort((a, b) => b.sim - a.sim);
  }
}

7.2 Cloud-Side (Untrusted)

class CloudHDCService {
  constructor() {
    this.kb = null;  // Bundled vector, semantics unknown
  }

  // Receive encoded KB from client
  loadKB(encodedKB) {
    this.kb = encodedKB;  // Just a vector, no meaning
  }

  // Process query - pure HDC operations
  query(queryVec) {
    // Unbind query from KB
    const result = bind(this.kb, queryVec);
    return result;  // Client will decode
  }

  // Prove goal - returns proof structure, not semantics
  prove(goalVec, ruleVecs) {
    // Standard backward chaining on vectors
    // Cloud doesn't know what's being proven
    return this.backwardChain(goalVec, ruleVecs);
  }

  // Federated aggregation
  aggregate(kbVectors) {
    return bundle(kbVectors);
  }
}

8. Threat Model and Limitations

8.1 What We Protect Against

8.2 What We Do NOT Protect Against

8.3 Appropriate Use Cases

Good fit:
Bad fit:

9. Research Directions

9.1 Hybrid HDC-FHE Systems

Combine HDC's efficiency with FHE's security for critical operations:

- Use HDC for bulk knowledge storage and retrieval (fast, moderate privacy)
- Use FHE for sensitive computations (slow, strong privacy)
- Develop protocols for secure handoff between layers

9.2 Oblivious HDC Protocols

Apply techniques from secure multi-party computation:

9.3 Privacy-Preserving Similarity

Develop protocols where similarity can be computed without revealing either vector:

Alice has vector A, Bob has vector B.
Compute sim(A, B) such that:
- Alice learns only sim(A, B)
- Bob learns only sim(A, B)
- Neither learns the other's vector

9.4 Leakage Quantification

Formal analysis of information leakage:

10. Conclusions

10.1 The Honest Summary

Claim Reality
"HDC is homomorphic encryption" False. No cryptographic security guarantees.
"HDC provides perfect privacy" False. Structural patterns leak.
"HDC offers practical obfuscation" True. Semantic content hidden if atoms secret.
"HDC enables efficient federated computation" True. With acceptable leakage trade-offs.
"HDC is better than nothing" True. Defense in depth, data minimization.

10.2 Recommendations

  1. Don't oversell: HDC privacy is not cryptographic security
  2. Know your adversary: Casual vs. sophisticated attackers
  3. Accept leakage: Design applications that tolerate structure leakage
  4. Layer defenses: Combine HDC with other privacy techniques
  5. Stay updated: This is an active research area

Further Reading