[SOLVED] How to export an AWS DynamoDB table to an S3 Bucket?

How to export an AWS DynamoDB table to an S3 Bucket?

I have a DynamoDB table that has 1.5 million records / 2GB. How to export this to an S3?

The AWS data pipeline method to do this worked with a small table. But i am facing issues with exporting the 1.5 million record table to my S3.

At my initial trial, the pipeline job took 1 hour and failed with

java.lang.OutOfMemoryError: GC overhead limit exceeded

I had increased the namenode heap size by supplying a hadoop-env configuration object to the instances inside the EMR cluster by following this link

After increasing the heapsize my next job run attempt failed after 1 hour with another error as seen in the screenshot attached. I am not sure what to do here to fix this completely.

Also while checking the AWS Cloudwatch graphs of the instances in the EMR cluster. The core node was continuously at a 100% CPU usage.

The EMR cluster instance types (master and core node) were m3.2xlarge.

Solution

The issue was with the maptasks not running efficiently. The core node was hitting 100% CPU usage. I upgraded the cluster instance types to one of the compute C series available and the export worked with no issues.