s/CPX/LPX/g

It seems like only a few months ago that @NVIDIA showed off Rubin CPX, a system combining the now-in-production Rubin data-center GPU with an RTX-like GPU that employed GDDR DRAM instead of #HBM. Well, that's not the system you're looking for. [Waves like a Jedi.] Today, Jensen announced LPX. As I had speculated, it's basically CPX but with Groq LPUs (NPUs) in place of the RTX-like chips. The LPU is the new LP30 design. A midlife kicker, LP35, is coming that supports NVFP4, Nvidia's FP4 data format. The GPU-LPU interconnect is a special low-latency version of Ethernet. I don't know if that's for better time to market than adding NVLink (assuredly the case for LP30) and will be here to stay. Target applications are premium-tier customers requiring fast token rates. Jensen mooted 1000 TPS/user. The challenge for GPUs has been the tradeoff between throughput and per-user token rate (essentially, latency). Batching is good for the former but bad for the latter. The LPU keeps throughput from collapsing as token rate increases. The LP30 has 500 MB of SRAM, which is great for latency but too little capacity for modern workloads' KV caches, hence the partitioning of LLMs phases between the 288 GB GPU and the 400 MB LPU. Speaking of DDR, Vera will talk to LPDDR instead of DDR, which should provide greater bandwidth at lower cost but without the DIMM-based expandability many applications require.

Other contents

Meta Bares MTIA Roadmap, Accelerates NPU Development

Meta Bares MTIA Roadmap, Accelerates NPU Development

Byrne-Wheeler Report Discusses AI Deals, Broadcom and Nvidia Earnings

Byrne-Wheeler Report Discusses AI Deals, Broadcom and Nvidia Earnings

s/CPX/LPX/g

s/CPX/LPX/g

CPU > GPU ? Vera : Rubin

CPU > GPU ? Vera : Rubin

AWS and Cerebras Team Up on AI Inference

AWS and Cerebras Team Up on AI Inference

FT Also Reports Nvidia Will Announce a Groq-Based Chip at GTC

FT Also Reports Nvidia Will Announce a Groq-Based Chip at GTC

Meta Has a Lot Riding on the MTIA

Meta Has a Lot Riding on the MTIA

Nvidia Partners with Startup Upscale on Scale-Out Switches

Nvidia Partners with Startup Upscale on Scale-Out Switches

Ubitium's Universal Processor Challenges Conventional Wisdom

Ubitium's Universal Processor Challenges Conventional Wisdom

Third-Gen Ceva PentaG Targets Satcom and the IoT

Third-Gen Ceva PentaG Targets Satcom and the IoT