Google TPUv8: Early Specs and Performance Gains

Google has unveiled its eighth-generation AI accelerator, revealing a design similar to past designs but with greater per-socket throughput, faster networking, and new collective engines. Per-chip gains should improve cost per token and performance, particularly as large language models grow well beyond 1 trillion parameters. As with some previous generations, the new accelerator (NPU) has […]