AGISystem2 Research

Efficient ML Execution Trends

A technical survey of optimized runtimes, compilers, and libraries for CPU-centric inference.

Inference Runtimes

ONNX Runtime A cross-platform engine by Microsoft that abstracts hardware through Execution Providers (EP) such as OpenVINO or XNNPACK.
OpenVINO Intel's framework optimized for Intel hardware, focusing on INT8 quantization and operator fusion.
TensorFlow Lite A framework for mobile and edge devices utilizing FlatBuffers and the XNNPACK library for reduced memory footprints.
Core ML Apple's on-device framework designed for heterogeneous execution across CPU, GPU, and Neural Engine (ANE).

ML Compilers

Compilers such as Apache TVM and XLA automate the generation of optimized machine code. The objective is to reduce overhead through graph-level optimizations like constant folding and kernel fusion.

Niche & Historical Runtimes

Lower-Level Kernel Libraries