amazon-web-servicesamazon-s3amazon-dynamodbamazon-emramazon-data-pipeline

Export DynamoDB to S3 data using AWS Datapipeline


I have a DynamoDB table storing 1Gb of data. RCU and WCU are 1000 each. I set up a Data pipeline to export this 1 GB of data to s3. The entire 1GB of data is exported to s3 in partitions. My question is what decides the number and size of these partitions?


Solution

  • mightyMouse,

    In his recent video "AWS Re:Invent Amazon DynamoDB advanced design patterns – Part 1," Rick Houlihan demonstrates setting 100k WCU provisioned capacity at table creation and mentions that each extra 1000 WCU provisioned will add 1 partition, thus his table starts with around 100 partitions. Partitions are internally limited to 10gb, splitting into two partitions if a partition hits this limit, but otherwise they will stay within one. This suggests that potentially all of your data still sits just within one partition.

    All this seems to be abstracted/hidden away from users, but generally you should be able to calculate the estimated number of partitions.

    With On-Demand mode AWS says they will automatically increase the throughput of your table by double the previous limit. The wording is a little bit weird, but I believe this ends up with them creating more partitions for you as a partition gets queried or written to faster than it can handle.

    Something to note is that many people have mentioned that once a partition has been provisioned you can lower the number of WCU to what you actually need and still keep the provisioned partitions.