amazon-web-servicesamazon-s3amazon-dynamodbamazon-emraws-data-pipeline

How to export an AWS DynamoDB table to an S3 Bucket?


I have a DynamoDB table that has 1.5 million records / 2GB. How to export this to an S3?

The AWS data pipeline method to do this worked with a small table. But i am facing issues with exporting the 1.5 million record table to my S3.

At my initial trial, the pipeline job took 1 hour and failed with

java.lang.OutOfMemoryError: GC overhead limit exceeded

I had increased the namenode heap size by supplying a hadoop-env configuration object to the instances inside the EMR cluster by following this link

After increasing the heapsize my next job run attempt failed after 1 hour with another error as seen in the screenshot attached. I am not sure what to do here to fix this completely.

enter image description here enter image description here

Also while checking the AWS Cloudwatch graphs of the instances in the EMR cluster. The core node was continuously at a 100% CPU usage.

The EMR cluster instance types (master and core node) were m3.2xlarge.


Solution

  • The issue was with the maptasks not running efficiently. The core node was hitting 100% CPU usage. I upgraded the cluster instance types to one of the compute C series available and the export worked with no issues.