google-cloud-data-fusion

Empty metadata for tables replicated using Google Cloud Data Fusion


I'm using Data Fusion to replicate tables from MysQL production databases into BigQuery. It works well and the latency is low, which is nice. On top of replicating all selected tables, Data Fusion adds a few metadata columns in each replicated table: _sequence_num, _source_timestamp, _row_id and, if the option was selected during job creation, _is_deleted.

While _is_deleted and _sequence_num columns are filled correctly, the _source_timestamp column always remains empty. Did I miss something during the job configuration or is it the expected behavior? I was not able to find anything relevant in the documentation.

Thanks in advance!

I tried multiple jobs configuration but none led to filled metadata columns.


Solution

  • The column _source_timestamp is only captured for CDC streams that provide unordered events, according to the documentation. Oracle is the only source plugin that generates unordered CDC streams today. The column _source_timestamp will not be populated for MySQL databases.Although the columns are present, they will be blank in the bigquery.

    An open issue has been created for this.