Nvidia Partners with Startup Upscale on Scale-Out Switches
Upscale AI emerged a few months ago as a well-funded startup developing a scale-up switch with an eye on UALink compatibility. Scale-up AI networking standards are new and won’t materialize if only XPU vendors back them: switch suppliers are needed. Meanwhile, Nvidia is deploying new versions of NVlink with every GPU generation and licensing it to other companies—an unusual proliferation of a proprietary technology.
The company has just disclosed it’s working on scale-out AI networking as well. Typically, these are Ethernet-based networks, but Nvidia is fond of Infiniband. While they favor Ethernet, hyperscalers also tend to run their own Layer 3+ protocols to achieve performance and security unavailable from bog-standard UDP or TCP/IP. These require some custom coding at the endpoint and possibly also in the switch, but standard switch ICs can carry this traffic.
Likewise, AI clusters may tweak L3+ and have specific timing demands, but off-the-shelf switch chips can address the market. For those needing to slash switch latency and get a little creative with Ethernet framing, Broadcom offers Tomahawk Ultra, disclosed last summer and discussed by the Byrne-Wheeler Report at https://youtu.be/xxN4WHr0-G0?si=LeFEEN-rNZkXadB2&t=1199
If simultaneously developing scale-up and scale-out switch chips sounds like a lot for a startup to bite off, Upscale AI is doing something more shocking. They’re building a switch system—chassis and Sonic-based software—based on the Nvidia Spectrum-X switch. Nvidia sells switch systems based on this chip, and the Upscale arrangement marks another norm-busting approach to Nvidia networking tech.
Upscale will give hyperscalers, neoclouds, and big enterprises seeking an alternative to Nvidia’s off-the-shelf switch—for example, a design adapted to their specific protocols or data-center design and management. Expect the startup to adapt its stack to meet customers’ needs instead of foisting only a different off-the-shelf solution onto them. This could be particularly valuable to companies with a heterogeneous XPU mix.
It’s still early days for Upscale. They haven’t disclosed product specifics or timelines, and there are plenty of big-picture details, such as NIC/endpoint support, to be mapped out. At least we don’t need to wonder about what ex-Cisco and ex-Juniper people were doing mining cryptocurrency. The Byrne-Wheeler Report discussed Upscale AI here: https://youtu.be/Y9yvfeSaens?si=EeNxS18WHLPq_xcK&t=885
Other contents