If the id
within the Capacitor file for this desk is dictionary-encoded, the system’s expression folding will consider all dictionary values, and, as a result of none of its values comprise two digits, decide that the REGEXP_CONTAINS
situation is all the time false, and change the WHERE
clause with a continuing false
. Because of this, BigQuery utterly skips scanning the Capacitor file for this desk, considerably boosting efficiency. In fact, these optimizations are relevant throughout a broad vary of situations and never simply to the question used on this instance.
Information-encoding-enabled optimizations
Our state-of-the artwork be part of algorithm tries to protect dictionary and run-length-encoded information wherever doable and makes runtime choices taking information encoding into consideration. For instance, if the probe aspect within the be part of secret is dictionary-encoded, we will use that information to keep away from repeated hash-table lookups. Additionally, throughout aggregation, we will skip constructing a hashmap if information is already dictionary-encoded and its cardinality is thought.
Parallelizable be part of and aggregation algorithms
Enhanced vectorization harnesses subtle parallelizable algorithms for environment friendly joins and aggregations. When parallel execution is enabled in a Dremel leaf node for sure query-execution modes, the be part of algorithm can construct and probe the right-hand aspect hash desk in parallel utilizing a number of threads. Equally, aggregation algorithms can carry out each native and international aggregations throughout a number of threads concurrently. This parallel execution of be part of and aggregation algorithms results in a considerable acceleration of question execution.
Tighter integration with Capacitor
We re-engineered Capacitor for the improved vectorization runtime, making it smarter and extra environment friendly. This up to date model now natively helps semi-structured and JSON information, utilizing subtle operators to rebuild JSON information effectively. Capacitor permits enhanced vectorization runtime to instantly entry dictionary and run-length-encoded information and apply numerous optimizations based mostly on information. It intelligently applies folding to a continuing optimization when a whole column has the identical worth. And it may prune expressions in capabilities anticipating NULL
, resembling IF_NULL
and COALESCE
, when a column is confirmed to be NULL
-free.
Filter pushdown in Capacitor
Capacitor leverages the identical vectorized engine as enhanced vectorization to effectively push down filters and computations. This permits for tailor-made optimizations based mostly on particular file traits and the expressions used. When mixed with dictionary and run-length-encoded information, this strategy delivers exceptionally quick and environment friendly information scans, enabling additional optimizations like expression folding.
Enhanced vectorization in motion
Let’s illustrate the ability of those strategies with a concrete instance. Enhanced vectorization accelerated one question by 21 occasions, slashing execution time from over one minute (61 seconds) right down to 2.9 seconds.
The question that achieved this dramatic speedup was: