Overview

Seamless AI Acceleration for Developers Everywhere

To scale the AI opportunity, developers need access to the fastest methods of AI deployment, together with optimal performance that best suits their specific workload. Arm is dedicated to maximizing AI performance across the entirety of the Arm platform, helping to ensure seamless acceleration for every developer, every model, and every workload.

PyTorch

Arm works closely with the PyTorch community, helping to ensure models running on PyTorch just work on Arm and driving seamless acceleration for even the most demanding AI workloads.

ExecuTorch

Together, Arm and ExecuTorch, a lightweight ML framework, enable efficient on-device inference capabilities at the edge.

Llama.cpp

To demonstrate the capability of Arm-based CPUs for LLM inferencing, Arm and partners are optimizing the int4 and int8 kernels implemented in llama.cpp to leverage these newer instructions.

Other Leading Frameworks

To maximize AI performance across the entirety of the Arm compute platform, we are dedicated to optimizing inference workloads across all major AI and ML frameworks.

MediaPipe

Arm’s partnership with Google AI Edge on MediaPipe and XNNPACK is accelerating AI workloads on current and future Arm CPUs. This enables developers to deliver outstanding AI performance for mobile, web, edge and IoT, using numerous LLMs, like Gemma and Falcon.

Thanks to Kleidi integration with MediaPipe via XNNPACK, a 30% acceleration in TTFT has been achieved when running a chatbot demo on the Gemma 1 2B LLM on Arm-based premium smartphones.

Read Blog

Angel

Tencent’s Angel ML framework supports Hunyuan LLM, available in sizes from 1B to over 300B parameters. It enables AI capabilities across a wide range of devices, including smartphones and Windows on Arm PCs.

Our partnership was announced at the 2024 Tencent Global Digital Ecosystem Summit and is having a positive impact on real-world workloads by providing users with even more powerful and efficient on-device AI services across Tencent’s many applications.

Read WeChat Post

Technologies

Key Developer Technologies for Accelerating CPU Performance

Arm Kleidi includes the latest developer enablement technologies designed to advance AI model capability, accuracy, and speed. This helps ensure AI workloads get the best out of the underlying Arm Cortex-A, Arm Cortex-X or Arm Neoverse CPU.

Arm Account

Register for an account

Seamless AI Acceleration for Developers Everywhere

Robust AI Ecosystem

Open Source Technology

Developer Enablement

Connecting Developers to a Robust AI Software Ecosystem

PyTorch

BERT-Large

Llama 3.1 8B

RoBERTa

FunASR Paraformer-Large

ExecuTorch

Stable Audio Open

Llama 3.2 1B

Llama.cpp

Custom SLM

TinyLlama 1.1B

TinyStories

Llama 3.3 70B

Phi 3 3.8B

Llama 3 8B

Other Leading Frameworks

MNN

OpenCV

MediaPipe

Angel

Key Developer Technologies for Accelerating CPU Performance

Latest News and Resources

Get Started With Arm Kleidi on Developer

Powering AI Through Software and Ecosystem Enablement

Optimizing AI Workloads on Arm CPUs

Arm Kleidi Arrives in Automotive Markets

On-Device Audio Generation 30x Faster With Stability AI

Enhanced Multimodal Experiences at the Edge

Extending Arm Kleidi to IoT

KleidiCV Accelerates Computer Vision by 4x

Accessible Generative AI in the Cloud With Arm and Meta

AI Workloads

Guide to Understanding AI Inference on CPU

Generative AI

The Role of Generative AI in Business Transformation

Software AI Acceleration

Why Software is Crucial to Achieving AI’s Full Potential

Generative AI

Scale Generative AI With Flexibility and Speed

Stay Connected