postgresqlamazon-rdsparquet

How to load an RDS snapshot (in parquet format) into a local PostgreSQL


There are a million tutorials on how to import PostgreSQL data into RDS, and how to export RDS database snapshots to S3, and how to convert from PostgreSQL to Parquet, but I can't find a single article or SO question about how to properly go the other way: I need to load a database snapshot that RDS exported to S3 (as a series of directories and parquet files) into a local PostgreSQL instance.

I exported the data just fine, and synced it to my local system using the AWS CLI:

aws s3 cp s3://snapshotbucketname .

But what comes next, now that I have the actual snapshot living on my drive? What's the correct way to turn this from "not a psql database at all" into a psql database that I can work with locally?


Solution

  • As far as I can tell this is straight up not possible using the AWS console and CLI tool. Instead you need to set up your database in RDS to be in a VPC that you can access, so that you can then use pg_dump to download your database, after which you can use pg_restore locally.

    For prod databases, they're in a VPC that's heavily restricted (if configured right) and the odds that you directly connect to it are near zero. In those cases, take a snapshot of the database ("maintenance" tab in the RDS console), then select that snapshop, and restore it to a new database, in a VPC and part of a security group that you do have access to. Once that's done, dump it, deleted the new database, and then delete the snapshot so you don't have a phantom db hanging around costing you money.

    Of course, if you're doing that because it's an old version, beware the hidden "thousands of dollars in support" fee that AWS charges for spinning up outdated DB versions. For those situations it might actually be cheaper (by far!) to instead spin up an EC2 instance, give it DB access, literally dump the entire db to file and then uploading the result to an S3 bucket using aws s3 cp that you can then download your database dump from.