AI/ML and HPC data-load acceleration
Hyperdisk ML is particularly optimized for accelerating knowledge load instances for inference, coaching and HPC workloads — Hyperdisk ML accelerates mannequin load time by 3-5x in comparison with frequent alternate options4. Hyperdisk ML is especially well-suited for serving duties in comparison with different storage providers on Google Cloud as a result of it may possibly concurrently present to many VMs exceptionally excessive mixture throughput (as much as 1.2 TiB/s of mixture throughput per quantity, providing larger than 100x larger efficiency than aggressive choices)5. You write as soon as (as much as 64 TiB per disk) and connect a number of VM situations to the identical quantity in a read-only mode. With Hyperdisk ML you may speed up knowledge load instances to your costliest compute assets, like GPUs and TPUs. For extra, take a look at g.co/cloud/storage-design-ai.
“At Resemble AI, we leverage our proprietary deep-learning fashions to generate high-quality AI audio by way of text-to-speech and speech-to-speech synthesis. By combining Google Cloud’s A3 VMs with NVIDIA H100 GPUs and Hyperdisk ML, we’ve achieved vital enhancements in our coaching workflows. Hyperdisk ML has drastically improved our knowledge loader efficiency, enabling 2x quicker epoch cycles in comparison with comparable options. This acceleration has empowered our engineering staff to experiment extra freely, practice at scale, and speed up the trail from prototype to manufacturing.” – Zohaib Ahmed, CEO, Resemble AI
“Abridge AI is revolutionizing medical documentation by leveraging generative AI to summarize patient-clinician conversations in actual time. By adopting Hyperdisk ML, we’ve accelerated mannequin loading speeds by as much as 76% and diminished pod initialization instances.” – Taruj Goyal, Software program Engineer, Abridge
Excessive-capacity analytics workloads:
For giant-scale knowledge analytics workloads like Hadoop and Kafka, that are much less delicate to disk latency fluctuations, Hyperdisk Throughput offers an economical resolution with excessive throughput. Its low price per GiB and configurable throughput are perfect for processing massive volumes of information with low TCO.
Find out how to dimension and arrange your Hyperdisk
To pick out and dimension the fitting Hyperdisk quantity sorts to your workload, reply just a few key questions:
-
Storage administration. Determine if you wish to handle the block storage to your workloads in a pool or individually. In case your workload can have greater than 10 TiB of capability in a single venture and zone, you need to think about using Hyperdisk Storage Swimming pools to decrease your TCO and simplify planning. Notice that Storage Swimming pools don’t have an effect on disk efficiency; some knowledge safety options reminiscent of Replication and Excessive Availability are usually not supported in Storage Swimming pools.
-
Latency. In case your workload requires SSD-like latency (i.e., sub-millisecond), it probably ought to be served by Hyperdisk Balanced or Hyperdisk Excessive.
-
IOPS or throughput. In case your software requires lower than 160K IOPS or 2.4 GiB/s of throughput from a single quantity, Hyperdisk Balanced is a good match. If it wants greater than that, take into account Hyperdisk Excessive.
-
Sizing efficiency and capability. Hyperdisk presents independently configurable capability and efficiency, permitting you to pay for simply the assets you want. You’ll be able to leverage this functionality to decrease your TCO by understanding how a lot capability your workload wants (i.e., how a lot knowledge, in GiB or TiB, is saved on the disks which serve this workload) and the height IOPS and throughput of the disks. If the workload is already working on Google Cloud, you may see many of those metrics in your console below “Metrics Explorer.”
One other vital consideration is the extent of enterprise continuity and knowledge safety required to your workloads. Totally different workloads have totally different Restoration Level Goal (RPO) and Restoration Time Goal (RTO) necessities, every with totally different prices. Take into consideration your workload tiers when making data-protection selections. The extra important an software or workload, the decrease the tolerance for knowledge loss and downtime. Functions important to enterprise operations probably require zero RPO and RTO within the order of seconds. Hyperdisk enterprise continuity and knowledge safety helps prospects meet the efficiency, capability, price effectivity, and resilience necessities they demand, and helps them handle their monetary regulatory wants globally.
Listed here are just a few questions to think about when deciding on which number of Hyperdisk to make use of for a workload:
-
How do I shield my workloads from assault and malicious insiders? Use Google Cloud Backup vault for cyber resilience, backup immutability, and indelibility for managed backup reporting and compliance. If you wish to self-manage your individual backups, Hyperdisk commonplace snapshots are an possibility to your workloads.
-
How do I shield knowledge from person errors and unhealthy upgrades price effectively with low RPO / RTO? You should use our point-in-time restoration with Immediate Snapshots. This function minimizes the danger of information loss from person error and unhealthy upgrades with ultra-low RPO and RTO — making a checkpoint is sort of instantaneous.
-
How do I simply deploy my important workload (e.g., MySQL) with resilience throughout a number of areas? You’ll be able to make the most of Hyperdisk HA. It is a nice match for situations that require excessive availability and quick failover, reminiscent of SQL Server that leverages failover clustering. For such workloads, it’s also possible to select our new functionality with Hyperdisk Balanced Excessive Availability with Multi-Author help. This lets you run clustered compute with workload-optimized storage in two zones with RPO=0 synchronous replication.
-
When a catastrophe happens, how do I get well my workload elsewhere shortly and reliably, and run drills to substantiate my restoration course of? Make the most of our catastrophe restoration capabilities with Hyperdisk Async Replication which allows cross-region steady replication and restoration from a regional failure, with quick validation help for catastrophe restoration drills through cloning. Additional, consistency group insurance policies assist make sure that workload knowledge that’s distributed throughout a number of disks is recoverable when a workload must fail over between areas.
Briefly, Hyperdisk offers a wealth of choices that will help you optimize your block storage to the wants of your workloads. Additional, deciding on the fitting Hyperdisk and leveraging options reminiscent of Storage Swimming pools may also help you decrease your TCO and simplify administration. To be taught extra, please go to our web site. For tailor-made suggestions, at all times seek the advice of your Google Cloud account staff.
1. As of March 2025 based mostly on revealed info for Amazon EBS, Azure managed disks.
2. As of Could 2025, in comparison with Amazon EBS gp3 volumes max iops/quantity
3. As of March 2025, at listing value, 50 to 150 TiB, peak IOPS of 25K to 75K and 25% compressibility, in comparison with Amazon EBS gp3 volumes.
4. As of March 2025, based mostly on inner Google benchmarking, in comparison with Speedy Storage, GCSFuse with Anyplace Cache, Parallelstore and Lustre for bigger node sizes.
5. As of March 2025 based mostly on revealed efficiency for Microsoft Azure Extremely SSD and Amazon EBS io2 BlockExpress
The authors want to thank David Seidman and Ruwen Hess for his or her contributions on this weblog.