BigQuery DataFrames adoption
We launched BigQuery DataFrames final yr as an open-source Python library that scales Python information processing with out having so as to add any new infrastructure or APIs, transpiling widespread Python information science APIs from Pandas and scikit-learn to varied BigQuery SQL operators. Since its launch, there’s been over 30X progress in how a lot information it processes and, as we speak, 1000’s of consumers use it to course of greater than 100 PB each month.
Over the last yr we advanced our library considerably throughout 50+ releases and labored intently with 1000’s of customers. Right here’s how a few early BigQuery DataFrames prospects use this library in manufacturing.
Deutsche Telekom has standardized on BigQuery DataFrames for its ML platform.
“With BigQuery DataFrames, we will provide a scalable and managed ML platform to our information scientists with minimal upskilling.” – Ashutosh Mishra, Vice President – Knowledge Structure & Governance, Deutsche Telekom
Trivago, in the meantime, migrated its PySpark transformations to BigQuery DataFrames.
“With BigQuery DataFrames, information science groups give attention to enterprise logic and never on tuning infrastructure.” – Andrés Sopeña Pérez, Head of Knowledge Infrastructure, Trivago
What’s new in BigQuery Dataframes 2.0?
This launch is full of options designed to streamline your AI and machine studying pipelines:
Working with multimodal information and generative AI strategies
-
Multimodal DataFrames (Preview): BigQuery Dataframes 2.0 introduces a unified dataframe that may deal with textual content, photographs, audio, and extra, alongside conventional structured information, breaking down the boundaries between structured and unstructured information. That is powered by BigQuery’s multimodal capabilities enabled by ObjectRef, serving to to make sure scalability and governance for even the most important datasets.
When working with multimodal information, BigQuery DataFrames additionally abstracts many particulars for working with multimodal tables and processing multimodal information, leveraging BigQuery options behind the scene like embedding technology, vector search, Python UDFs, and others.