Privacy-Preserving HDC: Towards Homomorphic Reasoning

This document analyzes the potential for implementing privacy-preserving computation using HDC primitives. We explore the parallels with homomorphic encryption, the possibility of cloud-based reasoning on encrypted knowledge, information leakage risks, and applications in federated learning.

1. The Vision: Compute Without Revealing

The fundamental question: Can we perform reasoning on knowledge without the reasoner knowing what the knowledge represents?

Homomorphic Encryption Analogy:

In homomorphic encryption, we can compute f(E(x)) = E(f(x)) - compute on encrypted data and get encrypted results that decrypt to the correct answer.

In HDC, can we achieve: reason(Encode(knowledge)) → Encode(conclusions)?

1.1 The HDC Privacy Hypothesis

HDC has properties that suggest privacy-preserving potential:

Distributed Representation: Information is spread across all dimensions
Deterministic Operations: Binding and bundling are mathematical, not learned
Reversibility: (A BIND B) BIND B = A works without knowing what A or B represent
Composition: Complex structures built from atomic vectors

The key insight: If the atomic concept vectors are secret keys, can the composite structures be safely processed by untrusted parties?

2. Architecture: Secret Atoms, Public Reasoning

2.1 The Basic Scheme

Setup Phase (Client - Private):
1. Generate random seed S (the master secret)
2. For each concept name, derive vector: V_concept = PRNG(Hash(S || "concept"))
3. Keep S and the name→vector mapping private

Encoding Phase (Client - Private):
4. Encode facts using secret atom vectors
5. fact_vec = Relation BIND (Pos1 BIND Arg1) BIND (Pos2 BIND Arg2)
6. Bundle all facts: KB = bundle([fact1, fact2, ...])

Reasoning Phase (Cloud - Public):
7. Send KB (bundled vector) to cloud
8. Cloud performs reasoning operations (prove, query)
9. Cloud returns result vectors

Decoding Phase (Client - Private):
10. Client decodes result vectors using secret atom vectors
11. Match results against known concept vectors

2.2 Why This Might Work

The cloud sees only high-dimensional vectors. Without knowing the atom vectors:

It cannot determine what concepts are encoded
It cannot read the structure of facts
It can still perform valid HDC operations (bind, unbind, bundle, similarity)

Key Property: HDC operations are structure-preserving regardless of what the vectors represent. The cloud computes correctly without understanding the semantics.

3. Information Leakage Analysis

Unfortunately, HDC representations leak information in several ways. This section analyzes what an adversary can learn.

3.1 Structural Information Leakage

Attack 1: Knowledge Base Size

The bundled KB vector doesn't hide how many facts were bundled. An adversary can estimate the number of facts from the vector's statistical properties.

Attack 2: Repeated Concepts

If the same concept appears in multiple facts, the KB will have higher similarity to that concept's vector. By probing with random vectors, an adversary can detect "hot spots" - frequently occurring patterns.

Attack 3: Query Correlation

If the client sends multiple queries, the adversary can correlate them. Queries about related concepts will have detectable similarity patterns.

3.2 Similarity-Based Attacks

The fundamental problem: similarity is observable.

Given two encoded facts F1 and F2:
sim(F1, F2) reveals structural relationship

If F1 = loves(John, Mary) and F2 = loves(John, Alice):
sim(F1, F2) > random baseline
→ Adversary learns: F1 and F2 share structure (same relation, overlapping arguments)

This is analogous to the problem of order-preserving encryption leaking ordering information.

3.3 Dictionary Attacks

If the adversary knows the domain vocabulary (e.g., all possible relation names):

For each candidate concept C in dictionary:
1. Generate probe vector V_C (using same PRNG algorithm)
2. Compute sim(KB, V_C)
3. High similarity → C is likely in the KB

Mitigation: This attack requires knowing the master seed S. If S is truly secret, the adversary cannot generate correct probe vectors. However, if the adversary can observe enough query-response pairs, they may be able to learn the mapping statistically.

3.4 What Cannot Be Hidden

Information Type	Hidden?	Notes
Exact concept names	Yes (if seed secret)	For hash/PRNG strategies, names can be hashed and not stored; EXACT uses a session-local dictionary for atom IDs
Number of facts	Partially	Estimable from vector statistics
Structural patterns	No	Similarity reveals shared structure
Query patterns	No	Repeated/similar queries are detectable
Reasoning complexity	No	Proof depth visible from computation
Knowledge graph topology	Partially	Connectivity patterns may leak

4. Comparison with Homomorphic Encryption

4.1 Fully Homomorphic Encryption (FHE)

True FHE provides:

Semantic security: Ciphertexts reveal nothing about plaintexts
Universal computation: Any function can be computed
Provable guarantees: Security based on hard mathematical problems

HDC provides none of these in a cryptographic sense.

4.2 What HDC Offers Instead

Property	FHE	HDC "Privacy"
Security model	Cryptographic (provable)	Obfuscation (heuristic)
Information leakage	None (semantic security)	Structural patterns leak
Computation speed	Very slow (10000x+ overhead)	Near real-time
Supported operations	Any (with circuit conversion)	HDC primitives only
Key management	Complex (bootstrapping)	Simple (single seed)
Practical today?	Limited applications	Yes, if leakage acceptable

4.3 The Honest Assessment

HDC is NOT homomorphic encryption.

It provides practical obfuscation with known leakage, not cryptographic security with provable guarantees. Use cases must accept this limitation.

5. Partial Homomorphic Properties

While not fully homomorphic, HDC does exhibit some homomorphic-like properties:

5.1 Additive Homomorphism (Bundling)

E(A) + E(B) = E(A + B)

bundle(encode(fact1), encode(fact2)) = encode(KB containing fact1 and fact2)

Bundling combines encoded facts without decoding them.

5.2 Multiplicative Homomorphism (Binding)

E(A) BIND E(B) = E(A BIND B)

The cloud can bind vectors without knowing what they represent.
This enables constructing new composite concepts from existing ones.

5.3 Query Homomorphism

Given: KB = encode(facts), Q = encode(query)
Result = KB BIND Q^-1 (unbind query from KB)

The cloud computes Result without knowing:
- What facts are in KB
- What the query asks
- What the result means

Client decodes Result using secret atom vectors.

6. Federated Learning Applications

HDC's properties make it interesting for federated scenarios where multiple parties contribute knowledge without revealing it.

6.1 Federated Knowledge Aggregation

Scenario: Multiple hospitals want to combine medical knowledge without sharing patient data.

Protocol:
1. All parties agree on master seed S (shared secret)
2. Each party encodes their local knowledge: KB_i = encode(local_facts_i)
3. A coordinator bundles: KB_global = bundle(KB₁, KB₂, ..., KB_n)
4. Any party can query KB_global using shared encoding

Privacy Properties

Coordinator sees: Bundled vectors only, not individual facts
Other parties see: Only the global KB, not each other's contributions
Leakage: Structural patterns may reveal if parties have similar knowledge

6.2 Private Query Answering

Scenario: User wants to query a cloud knowledge base without revealing what they're asking.

Protocol:
1. User encodes query with private atom vectors: Q = encode(query)
2. Cloud computes: candidates = KB BIND Q
3. Cloud returns top-k similar vectors from candidates
4. User decodes results locally

Analysis

Query privacy: Cloud doesn't know query semantics
Result privacy: User decodes, cloud doesn't know what was found
Leakage: Repeated similar queries are detectable; query structure patterns visible

6.3 Differential Privacy Integration

HDC's bundling operation is amenable to differential privacy:

KB_noisy = bundle(KB, noise_vector₁, noise_vector₂, ...)

Adding random "noise facts" provides plausible deniability:
- The presence of any specific fact cannot be proven
- Query accuracy degrades gracefully with noise level
- Standard differential privacy guarantees may apply

7. Implementation Architecture

7.1 Client-Side (Trusted)

class PrivateHDCClient {
  constructor(masterSeed) {
    this.seed = masterSeed;  // Keep secret!
    this.atomCache = new Map();
  }

  // Generate deterministic but secret atom vector
  getAtom(conceptName) {
    if (!this.atomCache.has(conceptName)) {
      const hash = SHA256(this.seed + conceptName);
      const vec = PRNG_Vector(hash, this.geometry);
      this.atomCache.set(conceptName, vec);
    }
    return this.atomCache.get(conceptName);
  }

  // Encode a fact using secret atoms
  encodeFact(relation, ...args) {
    let vec = this.getAtom(relation);
    for (let i = 0; i < args.length; i++) {
      const posVec = this.getAtom(`Pos${i+1}`);
      const argVec = this.getAtom(args[i]);
      vec = bind(vec, bind(posVec, argVec));
    }
    return vec;  // Safe to send to cloud
  }

  // Decode result by matching against known atoms
  decodeResult(resultVec, candidates) {
    return candidates
      .map(c => ({ name: c, sim: similarity(resultVec, this.getAtom(c)) }))
      .sort((a, b) => b.sim - a.sim);
  }
}

7.2 Cloud-Side (Untrusted)

class CloudHDCService {
  constructor() {
    this.kb = null;  // Bundled vector, semantics unknown
  }

  // Receive encoded KB from client
  loadKB(encodedKB) {
    this.kb = encodedKB;  // Just a vector, no meaning
  }

  // Process query - pure HDC operations
  query(queryVec) {
    // Unbind query from KB
    const result = bind(this.kb, queryVec);
    return result;  // Client will decode
  }

  // Prove goal - returns proof structure, not semantics
  prove(goalVec, ruleVecs) {
    // Standard backward chaining on vectors
    // Cloud doesn't know what's being proven
    return this.backwardChain(goalVec, ruleVecs);
  }

  // Federated aggregation
  aggregate(kbVectors) {
    return bundle(kbVectors);
  }
}

8. Threat Model and Limitations

8.1 What We Protect Against

Casual observation: Vectors don't reveal content without keys
Simple reverse engineering: Can't read facts from vector inspection
Cross-user correlation: Different seeds = incomparable vectors

8.2 What We Do NOT Protect Against

Statistical inference: Patterns in vectors reveal structure
Side-channel attacks: Timing, memory access patterns
Malicious computation: Cloud could return wrong results
Key compromise: If seed leaks, all privacy is lost
Quantum attacks: No post-quantum security analysis

8.3 Appropriate Use Cases

Good fit:

Reducing data exposure (defense in depth)
Complying with data minimization principles
Federated scenarios with trusted parties
Applications where structure leakage is acceptable

Bad fit:

High-security applications (use real FHE)
Adversarial settings with sophisticated attackers
Regulatory compliance requiring cryptographic guarantees
Long-term secrets (no forward secrecy)

9. Research Directions

9.1 Hybrid HDC-FHE Systems

Combine HDC's efficiency with FHE's security for critical operations:

- Use HDC for bulk knowledge storage and retrieval (fast, moderate privacy)
- Use FHE for sensitive computations (slow, strong privacy)
- Develop protocols for secure handoff between layers

9.2 Oblivious HDC Protocols

Apply techniques from secure multi-party computation:

Oblivious transfer for query submission
Secret sharing of atom vectors across parties
Garbled circuits for complex reasoning steps

9.3 Privacy-Preserving Similarity

Develop protocols where similarity can be computed without revealing either vector:

Alice has vector A, Bob has vector B.
Compute sim(A, B) such that:
- Alice learns only sim(A, B)
- Bob learns only sim(A, B)
- Neither learns the other's vector

9.4 Leakage Quantification

Formal analysis of information leakage:

Quantify bits leaked per query
Model cumulative leakage over time
Develop leakage budgets for applications

10. Conclusions

10.1 The Honest Summary

Claim	Reality
"HDC is homomorphic encryption"	False. No cryptographic security guarantees.
"HDC provides perfect privacy"	False. Structural patterns leak.
"HDC offers practical obfuscation"	True. Semantic content hidden if atoms secret.
"HDC enables efficient federated computation"	True. With acceptable leakage trade-offs.
"HDC is better than nothing"	True. Defense in depth, data minimization.

10.2 Recommendations

Don't oversell: HDC privacy is not cryptographic security
Know your adversary: Casual vs. sophisticated attackers
Accept leakage: Design applications that tolerate structure leakage
Layer defenses: Combine HDC with other privacy techniques
Stay updated: This is an active research area

Privacy-Preserving HDC