Arm hopes for a SME2 moment
Arm has announced SME2 support in XNNPack, a Google-developed library to accelerate AI inference frameworks like TensorFlow, LiteRT, PyTorch, ExecuTorch, and the ONNX runtime. XNNPack supports various processor environments, the most consequential of which is Android on Arm. Scalable Matrix Extensions (SME2) are new instructions for Arm CPUs and are analogous to Neon and Scalable Vector Extensions (SVE/SVE2).
The remarkable thing is that Arm doesn’t offer any CPUs supporting SME2. It’s rare for a processor vendor to get enablement in place before the hardware is ready. It’s a smart move, too, because it facilitates app readiness, and users benefit from Day One of hardware availability. We expect the next generation of Cortex-X and Cortex-A cores to support SME2. In the meantime, Apple devices, with their custom CPUs, have the extensions, so developers have a compatible platform now.
Whether CPUs should incorporate matrix units, however, is unclear. They occupy a lot of silicon but aren’t widely used. Moreover, PCs, smartphones, and servers often integrate NPUs (including GPUs), which deliver far greater performance and greater power efficiency. However, any offload engine adds complexity and latency compared with inline code running on a CPU. Thus, we expect SME2 to benefit interactive applications while NPUs handle more complex AI processing for applications that can tolerate some delay.
When Arm announces SME2-enabled CPUs, we'll cover them at XPU.pub. Sign up for the XPU newsletter at https://xpu.pub/email-newsletter-signup/
Other contents