aws-lambdaamazon-dynamodbamazon-dynamodb-streams

How to process Dynamo DB stream updates in order across shards?


I want to process a DynamoDB stream to make updates into a different DynamoDB in another account. The schemas are different so I will be transforming the data in between as well.

I have thought of the below solution:

  1. Enable DynamoDB streams on the source table.
  2. Process the stream in Lambda. As per my understanding DynamoDB streams offer ordered events for per shard.
  3. Apply updates in destination DynamoDB using the DynamoDB record.

I want to apply the updates on the destination DynamoDB in the same order in which they were realized in the source DynamoDB.

I was reading the documentation and it said that multiple shards can process the updates in the same partition and that a lambda invocation is triggered for each shard in parallel (Assuming Parallelization factor is 1 per shard). So then how can I ensure the records for each partition across shards is processed in order?

My solution:

I was thinking of including some kind of a counter for each update to item in source DynamoDB and use some global variable across lambda invocations to handle updates in order the case when items for the same partition key would be processed across different lambda invocations by virtue of them being in different shards.

I think there should be another way to this maybe with update timestamps? Are there any other better/cleaner ways to do this? Also please feel free to correct me if I have misunderstood anything.


Solution

  • Just use Timestamp ordering. When writing items to destination, add a condition to the write this_timestamp > existing_timestamp.

    DynamoDB streams has an ApproximateCreationDateTime however, its rounded to the nearest second and may not be granular enough for you use-case, so its best to implement your own. Ensure that you design elegantly to avoid clock-skew etc.. causing issues with your data consistency.