Here’s How NVIDIA’s Blackwell Ultra GB300 AI Racks Are Dominating Long-Context DeepSeek Workloads

Here’s How NVIDIA’s Blackwell Ultra GB300 AI Racks Are Dominating Long-Context DeepSeek Workloads

By Muhammad Zuhair
Publication Date: 2026-02-21 18:51:00

NVIDIA’s GB300 NVL72 AI racks have been tested across DeepSeek’s latest open source models, and through fine-tuning and optimized inference, the results are indeed promising.

NVIDIA’s Blackwell Ultra Scores Up to a 1.5x Lead Over GB200 NVL72 In Latency-Sensitive Workloads

With GB300, NVIDIA’s primary focus has been on delivering optimal long-context performance in order to capitalize on the agentic AI wave. In a recent post, we discussed how Blackwell Ultra delivers a 50x increase in throughput per megawatt compared to Hopper GPUs through its extreme co-design approach. Now, the Large Model Systems Organization (LMSYS) has tested GB300 NVL72 for long-context inference, with results looking extremely promising. The testing does include infrastructure-level software routing, which we’ll discuss next.

Given that with long-context workloads, the pressure tends to shift more towards GPU VRAM, the LMSYS team integrated PD (Prefill-Decode) Disaggregation, a widely used mechanism for maintaining large-scale token context. In simple terms, with PD Disaggregation, you split work across different hardware “nodes” to avoid bottlenecks. The prefill phase, which is, in simple terms, prompt processing, along with the decode phase, which is token generation, tends to be much more optimized with disaggregation, leading to improved throughput at scale.

The LMSYS team also employed several other optimization techniques, including…