1.1 The SLIDE Algorithm: Sub-Linear Deep Learning Engine
The SLIDE architecture, developed at Rice University, addresses the computational cost of neural network training by utilizing Locality Sensitive Hashing (LSH). It identifies active neurons for specific inputs without computing full layer activations.
- O(1) Complexity: Aims for constant-time lookups rather than O(N) linear complexity.
- CPU Optimization: Efficiently utilizes large L3 caches and branch prediction, which are less effective in GPU-centric SIMD architectures.
1.2 ThirdAI and Dynamic Sparsity
ThirdAI commercialized these concepts through the BOLT engine. The objective is to enable training and fine-tuning of large-scale models directly on standard x86 and ARM CPUs.
- Training Efficiency: Focus on dynamic sparsity during training, avoiding the need for high-bandwidth VRAM.
- Privacy: On-premise deployment for sensitive medical or financial datasets.
1.3 Neural Magic: Sparsification and serving
Neural Magic, born from MIT, focuses on Sparsification, pruning networks by 80-90% without losing accuracy. Following their 2025 acquisition by Red Hat, they have pivoted to contributing these optimizations directly to vLLM via the nm-vllm project.
Alternative Sparse Methods
- Random Projections: Based on the Johnson-Lindenstrauss lemma, these techniques reduce data dimensionality while preserving distances, providing a CPU-friendly alternative to deep embedding layers.
- Bloom Filters in ML: Utilizing probabilistic data structures for ultra-fast, constant-time set membership tests in large-scale classification tasks.
- MinHash: A technique for estimating the similarity of sets, used in large-scale deduplication and clustering before neural processing.
- Count-Min Sketch: A probabilistic data structure used for frequency estimation in data streams, useful for real-time feature selection on CPUs.