NVIDIA's Blackwell Architecture Dominates MLPerf Training Benchmarks

Published
November 13, 2025
Category
Major Tech Companies
Word Count
339 words
Listen to Original Audio

Full Transcript

NVIDIA's Blackwell architecture has achieved remarkable success in the latest MLPerf Training v5.1 benchmarks, delivering the fastest training times across all tested models. According to the NVIDIA Developer Blog, the Blackwell architecture powered both Blackwell and Blackwell Ultra GPUs, achieving a clean sweep in every benchmark. For instance, the Llama 3.1 405 billion parameter pretraining was completed in just ten minutes using 5,120 Blackwell GPUs, a substantial performance increase compared to previous submissions. This round showcased not only the peak performance but also the innovation in low-precision AI data formats, particularly with the introduction of the NVFP4 format. The architecture provides peak FP4 throughput per clock that is twice as high as FP8, and Blackwell Ultra GPUs further enhance this throughput to three times that of FP8.

The benchmarks in MLPerf Training v5.1 measured the time to train seven different models, including Llama 2 and FLUX, highlighting the flexibility and efficiency of NVIDIA’s offerings. For example, Blackwell GPUs achieved a training time of just 0.40 minutes for fine-tuning the Llama 2 70B model. In total, NVIDIA was the only platform to submit results for all benchmarks, underscoring the comprehensive capabilities of its training stack. The Blackwell architecture innovations also included advancements in Tensor Cores and Softmax operations, further accelerating performance in training large language models.

Overall, the Blackwell architecture's performance enhancements are attributed to a combination of hardware and software innovations, including optimized algorithms and efficient use of memory. NVIDIA's submission utilized the latest Quantum-X800 networking platform, marking the first instance of 800 Gb/s networking in MLPerf Training submissions, thus facilitating high-speed connectivity among GPUs. The architecture's success in MLPerf is critical for developers and companies as they rely on high-performance computing resources for AI applications. As NVIDIA continues to innovate on a yearly basis, the Blackwell architecture sets a new standard for future AI training benchmarks, driving performance improvements while reducing training costs and time. This consistent advancement not only solidifies NVIDIA's leadership in AI chip technology but also paves the way for groundbreaking developments in artificial intelligence.

← Back to All Transcripts