As an AI/ML developer, you could have numerous selections to make on the subject of selecting your infrastructure — even if you happen to’re working on high of a completely managed Google Kubernetes Engine (GKE) setting. Whereas GKE acts because the central orchestrator to your AI/ML workloads — managing compute assets, scaling your workloads, and simplifying complicated workflows — you continue to want to decide on an ML framework, your most popular compute (TPU or GPUs), a scheduler (Ray, Kueue, Slurm) and the way you need to scale your workloads. By the point you must configure storage, you’re going through determination fatigue!
You may merely select Google’s Cloud Storage for its measurement, scale and price effectivity. Nonetheless, Cloud Storage might not be an excellent match for all use circumstances. As an illustration, you would possibly profit from a storage accelerator in entrance of Cloud Storage like Hyperdisk ML for higher mannequin weights load occasions. However in an effort to profit from the acceleration these convey, you would wish to develop customized workflows to orchestrate information switch throughout storage techniques.
Introducing GKE Quantity Populator
GKE Quantity Populator is focused at organizations that need to retailer their information in a single information supply and let GKE orchestrate the info transfers. To attain this, GKE leverages the Kubernetes Quantity Populator characteristic by the identical PersistentVolumeClaim API that prospects use in the present day.
GKE Quantity Populator together with the related CSI drivers dynamically provision a brand new vacation spot storage quantity and switch information out of your Cloud Storage bucket to the vacation spot storage quantity. Your workload pods then wait to be scheduled till the info switch is full.