OpenAI has introduced a new networking protocol called Multipath Reliable Connection (MRC), developed in partnership with major technology companies including NVIDIA, Microsoft, AMD, Intel, and Broadcom. The company says the protocol will help make AI supercomputer networks faster, more reliable, and more efficient during the training of advanced AI models.
The announcement comes as OpenAI scales its infrastructure to support growing demand for ChatGPT. The company recently said that more than 900 million people now use the chatbot every week, increasing the need for high-performance systems capable of processing large volumes of data without delays.
MRC is a networking protocol designed to improve communication between GPUs in large AI training clusters. OpenAI described it as a “novel protocol” built into the latest 800Gb/s network interfaces to boost performance and resilience during AI model training.
Traditional AI systems typically send data through a single network path. Congestion or hardware failures on that path can slow training or interrupt workloads entirely. OpenAI says MRC addresses this issue by distributing data packets across hundreds of network paths simultaneously.
The company said this approach reduces congestion and allows systems to quickly bypass failed connections, helping training operations continue without major interruptions.
Also Read: OpenAI and Anthropic Step into Services Business as Indian IT Adapts
Training advanced AI models requires a massive computing infrastructure where GPUs constantly exchange information. OpenAI explained that a single training step can involve millions of data transfers across thousands of GPUs.
Even a single delayed transfer can leave expensive GPUs idle, wasting computing resources and increasing training time. OpenAI said MRC aims to reduce these bottlenecks by delivering more predictable network performance, even in the presence of failures.
“Our goal was not just to build a fast network, but also to build one that delivers very predictable performance, even in the presence of failures, to keep training jobs moving,” the company said in a blog post.
OpenAI said the protocol has already been deployed on its largest NVIDIA GB200 supercomputers. The company has also shared the MRC specification through the Open Compute Project, allowing other companies and researchers to adopt the technology.