Partition and cluster an existing BigQuery table

Sometimes it so happens that we create or are using a table with data that is non-partitioned but we need to convert this into a partitioned table. A typical use-case is old tables that start accumulate data over time. Quite often, we need the same data with extra partitions (and/or clusters), and the rest we can use DDL commands (such as updating metadata).

I’ve created a simple helper script that comes in handy for just that: https://github.com/raghuveer-s/example-code/blob/main/partition-bq-table/partition-bq-table.sql

The script does try to preserve some table and column metadata before partitioning, cluster, and copying the data into a new table. For the most part, you should be able to just change the variable names and use it.

Some final points to consider:

There are some assumptions baked in the script, such as the existence of common dataset where it creates the stored procedures needed.
You probably want to change the region (I have used region-eu in the code). The script makes use of the INFORMATION_SCHEMA and this needs a qualifier (https://cloud.google.com/bigquery/docs/information-schema-intro#syntax). Note for my future self: This really should be parameterized.
If you want to preserve the same table name for the partitioned table, add a couple of commands for changing table name with ALTER TABLE RENAME followed by DROP TABLE of the old table. Note that, changing a table name does come with some limitations in BigQuery (https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#alter_table_rename_to_statement).

Partition and cluster an existing BigQuery table

Comments

More from this blog

Combine Dataform with Typescript for Improved Workflows

Writing data quality tests in Dataform

A Practical Introduction to Google Cloud Dataform

Using Google Cloud Run to process batch jobs

Command Palette

Comments

More from this blog