Wednesday, December 25, 2024
spot_img

MLPerf Coaching Outcomes Showcase Unprecedented Efficiency, Elasticity



The total-stack NVIDIA accelerated computing platform has as soon as once more demonstrated distinctive efficiency within the newest MLPerf Coaching v4.0 benchmarks.

NVIDIA greater than tripled the efficiency on the massive language mannequin (LLM) benchmark, primarily based on GPT-3 175B, in comparison with the record-setting NVIDIA submission made final 12 months. Utilizing an AI supercomputer that includes 11,616 NVIDIA H100 Tensor Core GPUs linked with NVIDIA Quantum-2 InfiniBand networking, NVIDIA  achieved this outstanding feat by means of bigger scale — greater than triple that of the three,584 H100 GPU submission a 12 months in the past — and intensive full-stack engineering.

Due to the scalability of the NVIDIA AI platform, Eos can now prepare huge AI fashions like GPT-3 175B even quicker, and this nice AI efficiency interprets into vital enterprise alternatives. For instance, in NVIDIA’s latest earnings name, we described how LLM service suppliers can flip a single greenback invested into seven {dollars} in simply 4 years operating the Llama 3 70B mannequin on NVIDIA HGX H200 servers. This return assumes an LLM service supplier serving Llama 3 70B at $0.60/M tokens, with an HGX H200 server throughput of 24,000 tokens/second.

NVIDIA H200 GPU Supercharges Generative AI and HPC 

The NVIDIA H200 Tensor GPU builds upon the power of the Hopper structure, with 141GB of HBM3 reminiscence and over 40% extra reminiscence bandwidth in comparison with the H100 GPU. Pushing the boundaries of what’s doable in AI coaching, the NVIDIA H200 Tensor Core GPU prolonged the H100’s efficiency by as much as 47% in its MLPerf Coaching debut.

NVIDIA Software program Drives Unmatched Efficiency Beneficial properties

Moreover, our submissions utilizing a 512 H100 GPU configuration at the moment are as much as 27% quicker in comparison with only one 12 months in the past as a result of quite a few optimizations to the NVIDIA software program stack. This enchancment highlights how steady software program enhancements can considerably enhance efficiency, even with the identical {hardware}.

This work additionally delivered practically good scaling. Because the variety of GPUs elevated by 3.2x — going from 3,584 H100 GPUs final 12 months to 11,616 H100 GPUs with this submission — so did the delivered efficiency.

Be taught extra about these optimizations on the NVIDIA Technical Weblog.

Excelling at LLM Tremendous-Tuning

As enterprises search to customise pretrained giant language fashions, LLM fine-tuning is turning into a key trade workload. MLPerf launched a brand new LLM fine-tuning benchmark this spherical, primarily based on the favored low-rank adaptation (LoRA) method utilized to Meta Llama 2 70B.

The NVIDIA platform excelled at this process, scaling from eight to 1,024 GPUs, with the largest-scale NVIDIA submission finishing the benchmark in a file 1.5 minutes.

Accelerating Steady Diffusion and GNN Coaching

NVIDIA additionally accelerated Steady Diffusion v2 coaching efficiency by as much as 80% on the similar system scales submitted final spherical. These advances replicate quite a few enhancements to the NVIDIA software program stack, showcasing how software program and {hardware} enhancements go hand-in-hand to ship top-tier efficiency.

On the brand new graph neural community (GNN) take a look at primarily based on R-GAT, the NVIDIA platform with H100 GPUs excelled at each small and enormous scales. The H200 delivered a 47% enhance on single-node GNN coaching in comparison with the H100. This showcases the highly effective efficiency and excessive effectivity of NVIDIA GPUs, which make them excellent for a variety of AI purposes.

Broad Ecosystem Help

Reflecting the breadth of the NVIDIA AI ecosystem, 10 NVIDIA companions submitted outcomes, together with ASUS, Dell Applied sciences, Fujitsu, GIGABYTE, Hewlett Packard Enterprise, Lenovo, Oracle, Quanta Cloud Know-how, Supermicro and Sustainable Steel Cloud. This broad participation, and their very own spectacular benchmark outcomes, underscores the widespread adoption and belief in NVIDIA’s AI platform throughout the trade.

MLCommons’ ongoing work to carry benchmarking finest practices to AI computing is important. By enabling peer-reviewed comparisons of AI and HPC platforms, and conserving tempo with the speedy modifications that characterize AI computing, MLCommons supplies corporations in all places with essential knowledge that may assist information vital buying selections.

And with the NVIDIA Blackwell platform, next-level AI efficiency on trillion-parameter generative AI fashions for each coaching and inference is coming quickly.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisement -spot_img

Latest Articles