Arm, Meta Partner to Cut AI Power Costs and Optimize Cloud Performance

Table of Contents

TLDRs;

Arm and Meta form a multi-year partnership to optimize AI efficiency across cloud and edge systems.
Meta to deploy Arm’s Neoverse CPUs for AI ranking and recommendation workloads.
Collaboration enhances PyTorch and FBGEMM for Arm architectures to reduce energy use and costs.
Open-source contributions aim to improve AI inference for enterprises globally.

In a move that signals a major shift in how AI workloads are optimized, Meta and Arm have announced a multi-year partnership focused on making artificial intelligence systems more power-efficient and cost-effective.

As per a Wednesday release by Arm, the collaboration targets both cloud and edge computing, with the goal of optimizing Meta’s widely used AI frameworks, most notably PyTorch, to run seamlessly on Arm’s chip designs.

Meta plans to migrate key AI workloads such as ranking and recommendation systems, which power feeds on Facebook, Instagram, and other platforms, to Arm’s Neoverse-based data center platforms. This strategic transition is expected to cut energy consumption significantly compared to traditional x86 processors.

The deal underscores Meta’s growing emphasis on sustainability and operational efficiency as the company grapples with ballooning AI infrastructure costs and rising energy demands from its large-scale models.

Announcing a deepened, strategic partnership with @Meta to drive the next era of AI.

From software to the data center, we’re accelerating our collaboration, combining our power-efficient leadership with Meta’s AI innovation to scale AI everywhere: https://t.co/5o8Dh0Lgjf pic.twitter.com/ZzxyDYnCQe

— Arm (@Arm) October 15, 2025

PyTorch, FBGEMM Get Arm Boost

At the heart of the partnership lies a technical collaboration to improve AI inference performance. Meta’s engineers will work with Arm to optimize PyTorch and Facebook General Matrix Multiplication (FBGEMM) for Arm’s vector extensions, ensuring that AI models run faster and use less power.

These optimizations will feed back into the open-source ecosystem, enabling developers worldwide to leverage Arm’s efficiency gains in their own deployments. Early benchmarks show that the integration of KleidiAI, Arm’s AI-optimized library suite, could deliver up to 20% performance improvements in inference workloads.

This upgrade isn’t just about speed. For Meta, every incremental gain in efficiency translates to enormous cost savings, especially when serving AI-powered recommendations to more than 3 billion users daily.

Powering the Next Generation of AI Data Centers

Meta’s shift toward Arm Neoverse CPUs also aligns with the broader industry trend of diversifying away from x86-based architectures. The company will deploy its new systems using Nvidia Grace CPUs, which are built on Arm’s architecture and designed for high-efficiency data center operations.

Each rack-scale Nvidia GB200 or GB300 NVL72 system houses dozens of Arm-based Neoverse V2 CPUs, working in tandem with Nvidia GPUs to handle intensive AI workloads. While large language model (LLM) training remains GPU-bound, tasks like inference, ranking, and personalization will increasingly rely on CPUs optimized for performance per watt.

This architectural shift not only lowers power costs but also enhances Meta’s ability to scale sustainably, crucial as the company expands its AI footprint across multiple platforms.

Open Source and Enterprise Impact

The collaboration has implications far beyond Meta’s own ecosystem. Arm and Meta plan to open-source their optimizations, including improvements to PyTorch, FBGEMM, and ExecuTorch, Meta’s lightweight runtime for edge AI inference.

Enterprise developers will soon gain access to migration kits and Docker containers preloaded with Arm-optimized builds, making it easier for businesses to test, benchmark, and deploy AI workloads on Arm-based infrastructure like AWS Graviton, Google Axion, and Microsoft Cobalt.

Moreover, early results suggest that Arm-optimized ExecuTorch with KleidiAI can handle 350+ tokens per second in prefill and 40+ tokens per second in decode stages for quantized LLMs, an attractive prospect for companies seeking affordable, power-efficient AI inference.

By contributing these improvements back to open source, both companies aim to create a more balanced and sustainable AI infrastructure ecosystem.