Repost: How is the Memory Crisis Reshaping the AI and Server Worlds? 🧠💻
The latest episode of The Byrne-Wheeler Report is live, and we are exploring the memory pricing and availability (funny how those two go together) crisis that’s currently hitting the tech industry. From $135B quarterly hyperscaler spends to innovative startups trying to bypass DRAM altogether, this is an episode you can't afford to miss if you're tracking the future of hardware.
Inside the episode:
🎙️ Hosts: Joe Byrne & Bob Wheeler
💡 Special Guests:
Gary Smerdon, CEO of MEXT
Jim Handy, Principal Analyst at Objective Analysis
Key Episode 12 segments:
The main event: A game changer for server costs: Gary Smerdon, CEO of MEXT, joins us to explain how they are using AI drivers to swap cold pages to flash memory transparently, effectively doubling system memory at a fraction of the cost.
Special guest star: star memory analyst Jim Handy joins the show to share his insights. Why have DRAM prices quadrupled since September? Jim breaks down the trade ratio between HBM and DDR and why the shortage might last another two years.
Intro Chatter
The Quantization Revolution: We discuss Prism ML (out of Caltech) and Google’s TurboQuant. Are we moving toward a world of "skinnier weights" where 1-bit precision allows frontier models to run on your MacBook?
The KV Cache Bottleneck: While quantization helps model size, the pressure is shifting to the KV cache—especially for Mixture-of-Experts (MoE) architectures. This is what the 2025 TurboQuant paper addresses. That technique trades computing cycles for KV cache. (Not discussed: recent TurboQuant adaptations that are more computationally efficient, thus impacting token-generation rates.)
Other contents