amazon-dynamodbamazon-kinesisamazon-dynamodb-streamsamazon-kcl

Will Kinesis Client Library DynamoDB Adaptor Lose Data


The DynamoDB Streams Kinesis Adaptor published on github here has this function with the following comments:

The Kinesis model provides an adjacent parent shard ID in the event of a parent shard merge. Since DynamoDB Streams does not support merge, this always returns null.

I am concerned about this and I will describe my concern using an example of 7 shards, for simplicity lets name them 0 to 6.

0's parent is no longer available due to retention policy, 1,2,3,4,5 are siblings due to high traffic on the DynamoDB table, all of them have 0 as their parent, and 6 is a currently open shard and was the result of a merge since traffic spike on the DynamoDB table came down. I will also assume it can have only one parent so randomly its parent is 3.

So, does this mean if we start a Worker using this adapter against a DynamoDB Stream that has the above state, it will only begin to process shard 0, 3 and 6??


Solution

  • I learnt that DynamoDB Stream shards never merge. Even after traffic to the table had died down, each (parallel) shard will simply have lower throughput. The situation I described in my question will not happen.

    Also seems like

    A DynamoDB Stream shard may have at most 1 parent and at most 2 children.

    The bottom line I learn from this question is:

    Kinesis Client Library + the DynamoDB Streams Kinesis Adapter guarantees that all shards will be processed in order, except if you fall behind in processing a shard such that it is trimmed before you process it.