Switching from BigQuery’s logical storage to bodily storage can dramatically scale back your storage prices, and has for many shoppers we’ve labored with.
However for those who think about BigQuery time journey and fail-safe prices, it might find yourself costing you much more than logical storage – or generate greater storage prices than you have been anticipating.
On this put up, we’ll cowl:
- The pitfalls with BigQuery’s time journey and fail-safe storage that inflate your prices
- Your choices for getting round these pitfalls
The pitfalls with BigQuery time journey and fail-safe
BigQuery’s time journey function allows you to entry information that is been modified or deleted from any time limit inside a particular window. By default, your time journey window is seven days, however you possibly can modify it down to 2 days.
BigQuery’s fail-safe function retains deleted information for a further seven days after the time journey window for emergency information restoration (it was, till just lately, 14 days). And in contrast to with time journey, you possibly can’t modify the info retainment interval. Nonetheless, to get fail-safe information restored, you’ll want to open a ticket with Google Assist.
You pay for each time-travel and fail-safe storage prices when on bodily storage at lively bodily storage charges, whereas on logical storage you don’t pay for both.
The chart beneath simulates the scenario the place:
- Whole long-term bodily storage is 200 GiB after which 50 GiB is deleted,
- The time-travel window is seven days, which is then adopted by
- A fail-safe interval of seven days.
Think about the story described beneath from a stay BigQuery Q&A we just lately held:
We had a buyer who deleted a big desk that was in long-term bodily storage and time-travel saved the deleted desk in case it will must be restored. As soon as deleted, the desk information was transformed to lively storage inside time-travel and the fail-safe storage, for a complete of 21 days (seven on time-travel, 14 on fail-safe again when it was 14 days) the client unknowingly paid the lively storage price, resulting in an unexpectedly-large storage invoice.
Choice #1: Tweak your BigQuery time journey settings
By default, BigQuery time journey is about to seven days and will be diminished all the way down to as little as two days. If you happen to really feel that you would be able to afford to scale back information resilience, you possibly can scale back this setting to scale back the prices of the info saved in time-travel.
As an example, when you’ve got a sturdy backup course of from BQ to different sources like GCS, lowering your time journey interval to 2 days ought to be nice. A few of our prospects arrange auto-archive insurance policies/pipelines for his or her information to again that information up periodically, so in situations like this it additionally is sensible to have shorter time journey durations as a result of that information is already being backed up outdoors BigQuery.
From our expertise, most prospects genuinely do not want the total seven-day time journey interval, as even when they diminished it to 2 days, they’d nonetheless get the seven-day fail-safe choice on prime of that. So if slicing prices is extra vital than having the total seven days of information historical past, then you definately shouldn’t be too impacted by slicing down your time journey interval to 2 days.
Alternatively, if you’re about to delete a considerable amount of information, it could be price decreasing the time-travel interval to 2 days previous to deleting the info simply to scale back your storage footprint inside it quickly. Simply be sure to set this again to the earlier worth for those who want it!
Choice #2: Convert your desk to BigQuery logical storage earlier than deleting it
The opposite resolution is to change your dataset’s storage billing mannequin to logical storage earlier than deleting the info since you’re not billed for time-travel or fail-safe storage beneath this mannequin.
Be aware that you would be able to solely do that change as soon as each 14 days, so that you would want to attend 14 days earlier than altering again. Moreover, it will probably take as much as 24 hours for the change in storage mannequin to take impact.
Nonetheless, earlier than you make the change ensure that to run this question to check your prices with 14 days of logical storage vs. 9 days of lively bodily storage (minimal 2 time journey days + 7 fail-safe storage days).
Under is an instance output of this question, with “additional_costs_for_physical_storage” containing each time journey and fail-safe storage prices:
dataset | logical_active_price | base_physical_price | logical_long_term_price | base_long_term_price | additional_costs_for_physical_storage | logical_storage_price | physical_storage_price | difference_in_price_if_physical_is_chosen | suggestion |
warehouse | $ 0.57 | $ 0.07 | $ 0.00 | $ 0.00 | $ 62.27 | $ 0.57 | $ 62.34 | $ -61.77 | Maintain dataset on logical storage |
Choice #3: Don’t change to BigQuery bodily storage to start with
In case you are always including and deleting information, chances are high your time journey and fail-safe information volumes will likely be fairly excessive, so it may not even make sense to change to bodily storage within the first place.
It is because you aren’t charged for time journey and fail-safe information within the logical storage mannequin however are within the bodily storage mannequin.
So when you’ve got giant quantities of information in these areas, this will bloat the prices previous what logical storage would cost. It will likely be vital to look at these specific information volumes when contemplating the change to bodily storage.
The question shared above cycles by a venture (and single area), and appears at every of the datasets to provide the value breakdown and suggestion.
It is sensible to change to BigQuery bodily storage within the following situations:
- The vast majority of your information is textual content/strings (i.e. json, addresses, logs, and many others.) as they’ve the best compression ratios.
- When there’s a strong information lifecycle plan in place (for example, a desk partitioned by day and every partition expires after X variety of days).
- Information is just not being modified always in desk/partitions (as talked about above, this causes elevated time-travel and fail-safe information storage).
Last ideas
Whereas in lots of circumstances, switching from BigQuery logical storage to bodily storage can certainly scale back your storage prices, it is good to pay attention to the potential pitfalls that might erase the financial savings profit.