amazon-web-servicesamazon-dynamodbprimary-key-design

Is there a DynamoDB max partition size of 10GB for a single partition key value?


I've read lots of DynamoDB docs on designing partition keys and sort keys, but I think I must be missing something fundamental.

If you have a bad partition key design, what happens when the data for a SINGLE partition key value exceeds 10GB?

The 'Understand Partition Behaviour' section states:

"A single partition can hold approximately 10 GB of data"

How can it partition a single partition key?

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.Partitions

The docs also talk about limits with a local secondary index being limited to 10GB of data after which you start getting errors.

"The maximum size of any item collection is 10 GB. This limit does not apply to tables without local secondary indexes; only tables that have one or more local secondary indexes are affected."

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LSI.html#LSI.ItemCollections

That I can understand. So does it have some other magic for partitioning the data for a single partition key if it exceeds 10GB. Or does it just keep growing that partition? And what are the implications of that for your key design?

The background to the question is that I've seen lots of examples of using something like a TenantId as a partition key in a multi-tentant environment. But that seems limiting if a specific TenantId could have more than 10 GB of data.

I must be missing something?


Solution

  • TL;DR - items can be split even if they have the same partition key value by including the range key value into the partitioning function.


    The long version:

    This is a very good question, and it is addressed in the documentation here and here. As the documentation states, items in a DynamoDB table are partitioned based on their partition key value (which used to be called hash key) into one or multiple partitions, using a hashing function. The number of partitions is derived based on the maximum desired total throughput, as well as the distribution of items in the key space. In other words, if the partition key is chosen such that it distributes items uniformly across the partition key space, the partitions end up having approximately the same number of items each. This number of items in each partition is approximately equal to the total number of items in the table divided by the number of partitions.

    The documentation also states that each partition is limited to about 10GB of space. And that once the sum of the sizes of all items stored in any partition grows beyond 10GB, DynamoDB will start a background process that will automatically and transparently split such partitions in half - resulting in two new partitions. Once again, if the items are distributed uniformly, this is great because each new sub-partition will end up holding roughly half the items in the original partition.

    An important aspect to splitting is that the throughput of the split-partitions will each be half of the throughput that would have been available for the original partition.

    So far we've covered the happy case.

    On the flip side it is possible to have one, or a few, partition key values that correspond to a very large number of items. This can usually happen if the table schema uses a sort key and several items hash to the same partition key. In such case, it is possible that a single partition key could be responsible for items that together take up more than 10 GB. And this will result in a split. In this case DynamoDB will still create two new partitions but instead of using only the partition key to decide which sub-partition should an item be stored in, it will also use the sort key.

    Example

    Without loss of generality and to make things easier to reason about, imagine that there is a table where partition keys are letters (A-Z), and numbers are used as sort keys.

    Imaging that the table has about 9 partitions, so letters A,B,C would be stored in partition 1, letters D,E,F would be in partition 2, etc.

    In the diagram below, the partition boundaries are marked h(A0), h(D0) etc. to show that, for instance, the items stored in the first partition are the items who's partition key hashes to a value between h(A0) and h(D0) - the 0 is intentional, and comes in handy next.

    [ h(A0) ]--------[ h(D0) ]---------[ h(G0) ]-------[ h(J0) ]-------[ h(M0) ]- ..
      |   A    B    C   |       E    F   |   G      I    |   J    K   L  |
      |   1    1    1   |       1    1   |   1      1    |   1    1   1  |
      |   2    2    2   |       2    2   |          2    |        2      |
      |   3         3   |            3   |          3    |               |
      ..                ..               ..              ..              ..
      |            100  |           500  |               |               |
      +-----------------+----------------+---------------+---------------+-- ..
    

    Notice that for most partition key values, there are between 1 and 3 items in the table, but there are two partition key values: D and F that are not looking too good. D has 100 items while F has 500 items.

    If items with a partition key value of F keep getting added, eventually the partition [h(D0)-h(G0)) will split. To make it possible to split the items that have the same hash key, the range key values will have to be used, so we'll end up with the following situation:

    ..[ h(D0) ]------------/ [ h(F500) ] / ----------[ h(G0) ]- ..
          |       E       F       |           F         |
          |       1       1       |          501        |
          |       2       2       |          502        |
          |               3       |          503        |
          ..                      ..                    ..
          |              500      |         1000        |
    .. ---+-----------------------+---------------------+--- ..
    

    The original partition [h(D0)-h(G0)) was split into [h(D0)-h(F500)) and [h(F500)-h(G0))

    I hope this helps to visualize that items are generally mapped to partitions based on a hash value obtained by applying a hashing function to their partition key value, but if need be, the value being hashed can include the partition key + a sort key value as well.