BigQuery delivers optimized search/lookup question efficiency by effectively pruning irrelevant recordsdata. Nevertheless, in some circumstances, extra column info is required for search indexes to additional optimize question efficiency. To assist, we lately introduced indexing with column granularity, which lets BigQuery pinpoint related information inside columns, for quicker search queries and decrease prices.
BigQuery arranges desk information into a number of bodily recordsdata, every holding N rows. This information is saved in a columnar format, which means every column has its personal devoted file block. You possibly can study extra about this within the BigQuery Storage Internals weblog. The default search index is on the file stage, which suggests it maintains mappings from a knowledge token to all of the recordsdata containing it. Thus, at question time, the search index helps cut back the search area by solely scanning these related recordsdata. This file-level indexing strategy excels when search tokens are selective, showing in only some recordsdata. Nevertheless, eventualities come up the place search tokens are selective inside particular columns however frequent throughout others, inflicting these tokens to look in most recordsdata, and thus diminishing the effectiveness of file-level indexes.
For instance, think about a state of affairs the place we’ve a set of technical articles saved in a simplified desk named TechArticles
with two columns — Title
and Content material
. And let’s assume that the info is distributed throughout 4 recordsdata, as proven under.