SANTA CLARA — NVIDIA has unveiled that xAI’s Colossus supercomputer cluster, which boasts an impressive 100,000 NVIDIA Hopper GPUs in Memphis, Tennessee, has reached an extraordinary level of scale by leveraging the NVIDIA Spectrum-X™ Ethernet networking platform. Designed to deliver optimal performance to multi-tenant, hyperscale AI facilities, Spectrum-X’s Remote Direct Memory Access (RDMA) network ensures superior data handling, setting a new benchmark for AI infrastructure.
As the world’s largest AI supercomputer, Colossus is primarily used for training xAI’s Grok family of large language models, powering chatbots available to X Premium subscribers. In a bold move, xAI plans to double the size of Colossus to an unprecedented 200,000 GPUs, significantly enhancing its capacity for AI model training.
Remarkably, the construction of the facility housing this state-of-the-art supercomputer was completed in just 122 days — a timeline far shorter than the typical months or even years required for such large-scale systems. From the arrival of the first rack to the commencement of training, the process took only 19 days, highlighting the speed and efficiency of the collaboration between xAI and NVIDIA.
While training the Grok models, Colossus has achieved groundbreaking network performance, with zero latency degradation or packet loss across its entire network fabric. The system has maintained 95% data throughput, thanks to Spectrum-X’s sophisticated congestion control. This level of performance stands in stark contrast to standard Ethernet systems, which often experience flow collisions and only achieve around 60% throughput.
Gilad Shainer, NVIDIA’s senior vice president of networking, emphasized the mission-critical nature of AI, highlighting the need for systems that offer superior performance, security, scalability, and cost-efficiency. He noted that Spectrum-X is designed to help innovators like xAI by accelerating AI processing, analysis, and execution, ultimately reducing the time to market for AI solutions.
Elon Musk, in a statement on X, praised Colossus as the most powerful AI training system globally, acknowledging the contributions of xAI, NVIDIA, and their partners. A spokesperson from xAI also expressed pride in building the world’s largest supercomputer, noting that the combination of NVIDIA’s Hopper GPUs and Spectrum-X technology has enabled them to break new ground in AI model training, establishing a super-optimized AI production facility.
At the core of the Spectrum-X platform is the Spectrum SN5600 Ethernet switch, supporting speeds of up to 800Gb/s. Paired with NVIDIA BlueField-3® SuperNICs, the system delivers unparalleled performance. Spectrum-X’s advanced features, including adaptive routing, congestion control, and enhanced AI fabric visibility, provide scalable bandwidth and low latency—key for supporting large, multi-tenant generative AI environments and enterprises.
This collaboration between xAI and NVIDIA is setting new standards for the AI industry, showcasing how cutting-edge technology can push the limits of supercomputing and revolutionize AI development at scale.