Keep in mind that denormalization has some disadvantages. First off, they aren’t storage-optimal. Although, many times the low cost of BigQuery storage addresses this concern. Second, maintaining data integrity can require increased machine time and sometimes human time for testing and verification. We recommend that you prioritize partitioning and clustering before denormalization, and then focus on data that rarely requires updates.
Optimizing for storage costs
When it comes to optimizing storage costs in BigQuery, you may want to focus on removing unneeded tables and partitions. You can configure the default table expiration for your datasets, configure the expiration time for your tables, and configure the partition expiration for partitioned tables. This can be especially useful if you’re creating materialized views or tables for ad-hoc workflows, or if you only need access to the most recent data.
Additionally, you can take advantage of BigQuery’s long term storage. If you have a table that is not used for 90 consecutive days, the price of storage for that table automatically drops by 50 percent to $0.01 per GB, per month. This is the same cost as Cloud Storage Nearline, so it might make sense to keep older, unused data in BigQuery as opposed to exporting it to Cloud Storage.
Thanks for tuning in this week! Next week, we’re talking about query processing – a precursor to some query optimization techniques that will help you troubleshoot and cut costs. Be sure to stay up-to-date on this series by following me on LinkedIn and Twitter!