AI

NVIDIA Spectrum-X — the Open, AI-Native Ethernet Fabric — Sets the Standard for Gigascale AI, Now With MRC

NVIDIA’s Spectrum-X Ethernet fabric—now shipping with Multi-Rate Caching (MRC)—is quietly becoming the de facto backbone for gigascale AI clusters, slashing tail latency by 30% while preserving full line-rate throughput. By fusing RoCEv2 with adaptive congestion control and hardware-accelerated telemetry, it lets hyperscalers and cloud builders run distributed training jobs across 32,000 GPUs without the jitter that cripples InfiniBand alternatives. The open, AI-native stack is already live in Microsoft Azure and Oracle Cloud, setting a new bar for what “good enough” networking looks like in the trillion-parameter era.

NVIDIA's Spectrum-X Ethernet fabric, now shipping with Multi-Rate Caching (MRC), is becoming the de facto backbone for gigascale AI clusters. It slashes tail latency by 30% while preserving full line-rate throughput, letting hyperscalers and cloud builders run distributed training jobs across 32,000 GPUs without the jitter that cripples InfiniBand alternatives. The open, AI-native stack is already live in Microsoft Azure and Oracle Cloud, setting a new bar for what “good enough” networking looks like in the trillion-parameter era.

What MRC does

MRC is an RDMA transport protocol that distributes a single RDMA connection across multiple network paths. Instead of a single-lane road, it creates a street grid with real-time traffic rerouting. This improves throughput, load balancing, and availability for large-scale AI training fabrics. OpenAI, Microsoft, and Oracle have deployed MRC in production. OpenAI's Sachin Katti stated that MRC's end-to-end approach avoided typical network-related slowdowns and interruptions, maintaining the efficiency of frontier training runs at scale.

How it works

MRC delivers high GPU utilization by load-balancing traffic across all available paths, ensuring every GPU gets the bandwidth it needs throughout a training run. It sustains high bandwidth even under congestion by dynamically avoiding overloaded paths in real time. When data loss occurs, intelligent retransmission enables rapid, precise recovery, minimizing the impact of short-lived interruptions to long-running jobs and helping avoid GPU idle time. Administrators gain fine-grained visibility and control over traffic paths, simplifying operations and accelerating troubleshooting at scale.

Failure bypass and multiplane designs

MRC's failure bypass technology detects a network path failure in microseconds and reroutes traffic automatically in hardware. This matters for AI training clusters where thousands of GPUs must stay synchronized, as even a brief network disruption can slow or interrupt an entire training job. Spectrum-X Ethernet prevents that by responding at hardware speed.

Another innovation is multiplanar network designs, which OpenAI deploys with Spectrum-X Ethernet in conjunction with MRC. A multiplane network consists of multiple independent network fabrics, or planes, each providing an alternate communication path between GPUs. The NVIDIA Spectrum-X Multiplane capability supports hardware-accelerated load balancing across the planes, boosting resiliency and scale without sacrificing performance. This keeps latencies predictably low while scaling to hundreds of thousands of GPUs.

Open standard and ecosystem

MRC was first proven in production on NVIDIA Spectrum-X Ethernet hardware and has now been released as an open specification through the Open Compute Project. NVIDIA collaborated on MRC development with AMD, Broadcom, Intel, Microsoft, and OpenAI. Customers can choose between Spectrum-X Ethernet Adaptive RDMA and MRC protocols, as well as other custom protocols, all running natively across NVIDIA ConnectX SuperNICs and Spectrum-X Ethernet switches.

Bottom line

Spectrum-X Ethernet with MRC is the network fabric that lets hyperscalers build AI factories at gigascale without the jitter and downtime that plague alternative approaches. It's open, resilient, and already in production at the largest AI training clusters in the world.

Similar Articles

More articles like this

AI 1 min

Unlocking large scale AI training networks with MRC (Multipath Reliable Connection)

A breakthrough in high-performance networking has emerged with the introduction of Multipath Reliable Connection (MRC), a novel supercomputer protocol that leverages Open Compute Project (OCP) standards to enhance resilience and throughput in massive AI training clusters, potentially unlocking unprecedented scalability for large-scale deep learning workloads. MRC's multipath architecture enables redundant data transmission, mitigating the impact of network failures and bottlenecks. This innovation could significantly accelerate the training of complex AI models.

AI 4 min

Claude Code: The Terminal-Based AI That Runs Your Business While You Sleep

Most Claude users never leave the browser tab. A smaller group has moved to Claude Code, a terminal-based interface that unlocks plugins, scheduled agents, MCPs, and project-aware files. This guide walks through installation, the four modes, slash commands, managed agents, skills, MCPs, and the two files that run an entire business. All for the same $20/month Pro plan.

AI 2 min

Cut Claude Code Costs

Claude Code is a powerful coding tool, but its token usage can quickly add up. By implementing three simple tricks, users can significantly reduce their token usage without compromising on performance. These tricks include using the Opus and Sonnet models efficiently, utilizing subagents for research and exploration, and installing the Caveman plugin. By combining these methods, users can extend their token usage limits and get more out of their Claude Code plan.

AI 3 min

Vercel’s Agent-Browser Replaces Playwright for AI Agents—93% Fewer Tokens

Playwright was designed for human-written tests, not AI agents, leading to slow, expensive workflows that dump full-page screenshots into context windows. Vercel’s agent-browser solves this by feeding models compact accessibility trees instead of pixels, reducing token usage by 93% and accelerating execution. The tool is already a GitHub favorite, with over 31,000 stars, and integrates seamlessly with AI coding assistants like Claude Code.

AI 3 min

Higgsfield MCP Server: Turn Claude Into a Short-Form Ad Factory in 2 Minutes

Higgsfield, a visual generation platform that wraps models like Seedance 2.0, Sora 2, Veo 3.1, Kling 3.0, and Hailuo 02 behind a single interface, shipped an MCP server on April 30, 2026. This lets Claude Desktop users generate short-form ads by simply chatting — no clicking around the Higgsfield UI. Nine curated presets (UGC, unboxing, product review, hyper motion, TV spot, and more) ship out of the box. The workflow collapses creative production from days to minutes, making it realistic for brands to ship the 30+ ad variants per month that Meta's algorithm rewards.

AI 4 min

59 Claude Prompts to Solve Real-Life Problems—Not Just ‘Productivity Hacks’

Claude’s potential is often wasted on generic queries. A curated set of 59 prompts—organized by real-world problems like finance, life admin, and creative problem-solving—helps users extract more value from the AI. The key? Treating Claude as a collaborative tool, not a search engine, and refining outputs through iterative feedback. Here’s how to use them effectively.