A3 Extremely and A4 VMs MLPerf 5.0 Inference outcomes
For MLPerf™ Inference v5.0, we submitted 15 outcomes, together with our first submission with A3 Extremely (NVIDIA H200) and A4 (NVIDIA HGX B200) VMs. The A3 Extremely VM is powered by eight NVIDIA H200 Tensor Core GPUs and gives 3.2 Tbps of GPU-to-GPU non-blocking community bandwidth and twice the excessive bandwidth reminiscence (HBM) in comparison with A3 Mega with NVIDIA H100 GPUs. Google Cloud’s A3 Extremely demonstrated extremely aggressive efficiency, reaching outcomes similar to NVIDIA’s peak GPU submissions throughout LLMs, MoE, picture, and advice fashions.
Google Cloud was the one cloud supplier to submit outcomes on NVIDIA HGX B200 GPUs, demonstrating wonderful efficiency of A4 VM for serving LLMs together with Llama 3.1 405B (a brand new benchmark launched in MLPerf 5.0). A3 Extremely and A4 VMs each ship highly effective inference efficiency, a testomony to our deep partnership with NVIDIA to offer infrastructure for essentially the most demanding AI workloads.
Prospects like JetBrains are utilizing Google Cloud GPU cases to speed up their inference workloads:
“We’ve been utilizing A3 Mega VMs with NVIDIA H100 Tensor Core GPUs on Google Cloud to run LLM inference throughout a number of areas. Now, we’re excited to start out utilizing A4 VMs powered by NVIDIA HGX B200 GPUs, which we count on will additional scale back latency and improve the responsiveness of AI in JetBrains IDEs.” – Vladislav Tankov, Director of AI, JetBrains
AI Hypercomputer is powering the age of AI inference
Google’s improvements in AI inference, together with {hardware} developments in Google Cloud TPUs and NVIDIA GPUs, plus software program improvements comparable to JetStream, MaxText, and MaxDiffusion, are enabling AI breakthroughs with built-in software program frameworks and {hardware} accelerators. Be taught extra about utilizing AI Hypercomputer for inference. Then, try these JetStream and MaxDiffusion recipes to get began right now.