AGISystem2 Research

High-Fidelity Runtimes

Execution strategies for high-precision inference on commodity CPU hardware.

Ziroh Labs and Kompact AI

Ziroh Labs focuses on the development of runtimes that circumvent the GPU barrier. Their Kompact AI project explores inference without the fidelity loss typical of low-bit quantization.

Full-Precision (BF16) Execution

The technical objective is to run models at Full Precision (BF16) on CPUs by optimizing memory cycles and computation scheduling. This is applicable in domains where the stochastic errors introduced by 4-bit quantization are not tolerable.

Semantic Caching Layers

The implementation of an internal semantic caching layer (designated "Elephant") aims to detect input similarities to bypass redundant inference cycles. This architecture is designed for repetitive enterprise workloads.

Performance Analysis: Benchmarks indicate throughput levels of 164 tokens/sec on standard CPUs, targeting efficiency comparable to discrete accelerators for specific batch-size configurations.

Infrastructure and Independence

The development of these runtimes enables the creation of high-performance AI infrastructure using commercially available commodity hardware, reducing dependency on proprietary specialized accelerators.

Foundational Math Libraries