mysqldatabaseamazon-web-servicesamazon-s3aws-dms

AWS DMS - Microsecond precision for CDC on MYSQL as source EndPoint


I am using AWS DMS for migrating data from MYSQL as source endpoint and S3 as target endpoint.
I want to track the updates from source so during the configuration, I have enabled TimestampColumnName property (col name : event_timestamp).
In the result (listed below), I am getting the timestamp of records/events but NOT the micro-second precision.

I want microsecond precision to build sequence logic on top of that.
I have investigated the property of source endpoint as well as target but not getting desired result. Here is the sample output :

enter image description here .

Can somebody take a look and suggest if I am missing any property.
Output format: for my file in S3 is parquet.


Solution

  • Unfortunately DATETIME column added by AWS DMS S3 TimestampColumnName for change data capture (CDC) load from MySQL source will have only second precision.

    Because transaction timestamp in MySQL binary log has only seconds.


    Simplest solution is to add to MySQL table new column - timestamp with microsecond precision with default value to be set on insert or / and update automatically and use this column as event_timestamp.

    ts TIMESTAMP(6) DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
    

    Also, check that in AWS DMS to S3 setting ParquetTimestampInMillisecond is False (or not present / unset, false is default).


    AWS DMS S3 TimestampColumnName setting adds a column with timestamp to the output.

    In 'static' read - it will generate current timestamp:

    For a full load, each row of this timestamp column contains a timestamp for when the data was transferred from the source to the target by DMS.

    For CDC it will read transaction time from database transaction log:

    For a change data capture (CDC) load, each row of the timestamp column contains the timestamp for the commit of that row in the source database.

    And its precision will be the one of the timestamp in database transaction log:

    ...the rounding of the precision depends on the commit timestamp supported by DMS for the source database.

    CDC mode is essentially replication. Source database should be configured appropriately to write such transaction log. Database writes to this log transaction info along with transaction / commit timestamp.

    In case of MySQL this is the binary log. And MySQL binlog timestamp is only 32 bit - just seconds.


    Also, this transaction timestamp may not always be in line with the actual order of transactions or the order changes were actually committed at (link 1, link 2).