Meta documents its CXL experience, including its Vistara CXL chip
Memory capacity has become a defining constraint in modern hyperscale computing. At Meta, approximately 40% of Meta’s servers are limited not by compute throughput but by DRAM capacity, restricting both performance and fleet scalability. Compute Express Link (CXL) offers a compelling architectural shift by decoupling memory expansion from CPU memory channels, enabling flexible scaling and the reuse of retired DIMMs. Yet despite six years of research and commercial momentum, CXL’s real world applicability has remained largely unproven.
Meta’s deployment of Vistara, a custom CXL ASIC, provides the first comprehensive evidence from a large scale production environment. This paper presents an end to end evaluation — from hardware design and OS integration to workload level results — demonstrating both the practical benefits and the challenges of CXL at hyperscale. This paper also corrects several misconceptions, including assumptions about long tail latency behavior and TPP software overheads. In production, Vistara delivers substantial improvements: up to a 25% reduction in server count for disaggregated ML inference and a 29% reduction in average latency for distributed caches. CXL’s impact is no longer hypothetical; it is measurable and significant.
Other contents