amazon-web-servicesamazon-s3amazon-dynamodbdata-migration

Export Entire Data from Dynamo DB table to S3


How do I export my entire data from Dynamo DB table to an s3 bucket?

My table is more than 6 months old and I need entire data to be exported to an s3 bucket.

  1. PITR and export to s3 built-in functionality allows only 35days prior export is what I understand.
  2. Data pipeline is going to be depreciated and costly since I will need this only for a single use. Post this export, I will be using the built-in export functionality.
  3. ExportTableToPointInTime API also allows exports within PITR window.
  4. No AWS Glue, because I don't need to do any ETL services or do any write back to Dynamo.
  5. No EMR cluster because its going to incur cost and setup and maintenance can be a menace.

I can take a backup of table but I want the data available in s3 so that if needed in future, we can fetch it directly from s3.

So I need a way to export entire data from dynamo to s3 with minimal cost and infra requirements, for a single use.


Solution

  • TLDR;

    PITR allows you to restore/export a table from any point in time within a sliding 35 day window. But those restores/exports include all of the data up to that point. So if you chose to export at the current time, you will get every byte of data in the table.

    Answers in-line

    1. PITR and export to s3 built-in functionality allows only 35days prior export is what I understand.

    Your understanding is somewhat correct, but obscured. PITR allows you to restore your table to a specific point in time within the last 35 days, in which it will include all of your data up to that point. So if you decide to export/restore the table to the current time, then you will get all of the data in your table, even data that is several years old.

    1. Data pipeline is going to be depreciated and costly since I will need this only for a single use. Post this export, I will be using the built-in export functionality.

    Don't use it.

    1. ExportTableToPointInTime API also allows exports within PITR window.

    Again, this is exactly what you need, the time frame is a benefit that allows you to choose a specific second in the last 35 days, but you get all the data up to that point. Its useful if for example your noticed you corrupted your data yesterday as 12:43:43pm and you want to get back to 12:43:42pm.

    1. No AWS Glue, because I don't need to do any ETL services or do any write back to Dynamo.

    Correct

    1. No EMR cluster because its going to incur cost and setup and maintenance can be a menace.

    Correct, EMR is overkill unless you need transformations.