Inference Runtimes
ONNX Runtime
A cross-platform engine by Microsoft that abstracts hardware through Execution Providers (EP) such as OpenVINO or XNNPACK.
OpenVINO
Intel's framework optimized for Intel hardware, focusing on INT8 quantization and operator fusion.
TensorFlow Lite
A framework for mobile and edge devices utilizing FlatBuffers and the XNNPACK library for reduced memory footprints.
Core ML
Apple's on-device framework designed for heterogeneous execution across CPU, GPU, and Neural Engine (ANE).
ML Compilers
Compilers such as Apache TVM and XLA automate the generation of optimized machine code. The objective is to reduce overhead through graph-level optimizations like constant folding and kernel fusion.
Niche & Historical Runtimes
- Apache MXNet: A highly scalable deep learning framework that pioneered early memory-efficient execution and symbol-based graph optimization.
- Paddle Lite: Baidu's high-performance inference engine for mobile, embedded, and IoT devices.
- SNPE (Snapdragon Neural Processing Engine): Qualcomm's SDK for execution on Hexagon DSPs and Adreno GPUs, providing early examples of mobile-first AI acceleration.
- Tengine: An open-source intelligent software framework for ARM-based IoT devices.