We are currently designing a dynamodb table to store certain file attributes. There are 2 main columns
Currently the partition key is Date and sort key is FileName. We expect about 500000 files with distinct file names on each day (this can increase over period of time) . The File names will repeated same each day i.e. a typical schema is as shown below
Date FileName 20190617 abcd.json 20190618 abcd.json
We have a series of queries that is based on Date and a dynamodb trigger. The queries are working great. Currently what we are observing is that the number of concurrent lambda executions are limited to 2, since we are partition by date. While trying to improve the concurrency of lambda we came across 2 solutions
1) Referring the following link (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-sharding.html) , one idea is add a fixed number of random suffixes for Date Field i.e (20190617.1 to 20190617.500) to split the data in to 500 partitions with 1000 records each . This would ensure an amount of concurrency and also there will be minimal changed to query
2) Second option is to change partitioning of table as follows Partition Key :- FileName and SortKey :- Date. This will result in about 500000 partitions , (which can increase) . For querying by date we will need to add a GSI, but we will achieve more concurrency in Lambda
we have not created a table with 500000 partitions (which can increase). Any body has such experience... If so please comment
Any help is appreciated
You seem to be under the mistaken impression that there's a one to one correspondence between partition keys and partitions.
This is not the case.
The number of partitions is driven by table size and through-put. The partition key is hashed by DDB and the data is stored in a particular partition.
You could have 100k partition keys and only a single partition.
If you're pushing the limits of DDB, then yeah you might end up with only a single partition key in a partition...but that's not typical.
The DDB Whitepaper provides some details into how DDB works...