[SOLVED] Apply partitioning and change clustering for replicated data base to BQ with DataFusion

Apply partitioning and change clustering for replicated data base to BQ with DataFusion

I replicate data from MySQL data base to BigQuery using DataFusion.

My original table in MySQL is not partitioned but I want it to be partitioned by a column when it is replicated to BQ.

Additionally BQ assigns a column for clustering by default on its own for PK column when I run replication job in DataFusion.

Questions:

Is it possible at the very beginning of replication job in DataFusion (MySQL -> BQ) to set some tables to become partitioned by specific columns, while the rest of replicated table leave replicating as they are?
And is it possible to change or set cluster columns on start of the replication process to BQ?

Solution

Response from Google Cloud Community:

BigQuery and DataFusion do not directly support partitioning of existing tables that are replicated.

BUT the method: pausing DataFusion replication, creating a new partitioned table, dropping the original table, and then renaming the new table to the original name, resuming DataFusion replication is the recommended approach for partitioning existing replicated tables, and it works.

This approach involves some downtime as the original table is dropped and the new partitioned table is created. To minimize downtime, you can create the new partitioned table with the same schema as the original table and then copy the data from the original table to the new table. Once the data is copied, you can drop the original table and rename the new table to the original name.