amazon-web-servicesamazon-dynamodbamazon-data-pipelineaws-data-pipeline

Increase & Decrease DynamoDb RCU from AWS DataPipeline


I have an AWS DynamoDb table which is write intensive. I've configured it in the provisioned capacity mode with 10,000 WCU and 1000 RCU.

I'm using AWS Datapipeline to export DynamoDb contents to S3. The pipeline is configured with the read throughput ratio 75%.

It takes around ~2 hours to export ~150GB of data in this setting. When I increased the RCU to 10,000 the export completed in less than 20 minutes.

Is there any way in DataPipeline to increase the provisioined RCU only when my pipeline is running? As this pipeline is configured to run only once in a day.


Solution

  • You can’t control the DynamoDB capacity from within the data pipeline job.

    However, you can use AWS Step Functions to orchestrate ETL jobs with other arbitrary steps. So, your solution could be a scheduled CloudWatch event that starts a Step Function to:

    1. Set the capacity of your DynamoDB table. (I think you would need to write a simple Lambda function for this because Step Functions can’t do it directly.)
    2. Invoke a Lambda that starts the Data Pipeline job
    3. Wait for the job to complete
    4. Reset the read capacity of the table

    Additional Resources