Ziroh Labs and Kompact AI
Ziroh Labs focuses on the development of runtimes that circumvent the GPU barrier. Their Kompact AI project explores inference without the fidelity loss typical of low-bit quantization.
Full-Precision (BF16) Execution
The technical objective is to run models at Full Precision (BF16) on CPUs by optimizing memory cycles and computation scheduling. This is applicable in domains where the stochastic errors introduced by 4-bit quantization are not tolerable.
Semantic Caching Layers
The implementation of an internal semantic caching layer (designated "Elephant") aims to detect input similarities to bypass redundant inference cycles. This architecture is designed for repetitive enterprise workloads.
Infrastructure and Independence
The development of these runtimes enables the creation of high-performance AI infrastructure using commercially available commodity hardware, reducing dependency on proprietary specialized accelerators.
Foundational Math Libraries
- BLIS (BLAS-like Library Instantiation Software): A framework for instantiating high-performance BLAS-like software libraries, providing extreme control over micro-kernels.
- OpenBLAS: An optimized BLAS library based on GotoBLAS, historically critical for high-precision scientific computing on CPUs.
- Intel MKL: The industry standard for math kernels on x86, providing the performance baseline for all other high-fidelity runtimes.
- AMD AOCL: AMD's suite of libraries optimized for EPYC processors, targeting maximum numerical precision and throughput.