databasepostgresqldockerdocker-composetest-environments

Way to switch between database dumps in a docker containers


Situation: I'm working on a big startup web-project, which frequently goes to production, so all development is pretty fast. We have several environments - dev (local), QA, stage and production with different data in databases of course (we use postgres). My daily routine is that while I working on some new feature, some QA person can find a critical bug on one of these environments so I have to fix it, or at least look what is wrong.

Problem: to switch from local context to production/qa/stage, I usually dump database from these environments locally and then do the debugging. The problem is that, at first, dumps are very heavy and it takes sometimes up to 30 minutes to download and apply it, at second - when I dump it into local database, I'm loosing my local development context.

Wish: being able to switch contexts locally fast

Example: lets say we have docker container for web server, which is linked to postgres container like in this sample docker-compose.yml file

version: '3'

services:
  pg:
    image: "mdillon/postgis"
    hostname: pghost
    ports:
          - "5433:5432"
    volumes:
          - "~/pgdata:/var/lib/postgresql/data"
    ...
  webserver:
    image: "some_app_image"
    links:
          - pg:postgres
   ...

And lets pretend that this web-server is really heavy so that to have multiple running containers with this would be a big problem in terms of memory usage and readability.

Question: is there any elegant (or not) way to make fast switch between different database data sets? maybe have different pgdata folders, or somehow link several postgres containers (though I'm not sure it is possible)


Solution

  • It's never going to be fast, since the fact of the matter is that backing up, downloading, and restoring large databases takes time. All you can really control enough to optimize is how often you have to do any of those three things. Your options there come down to using a service such as RDS or Google Cloud SQL which would let you backup and restore databases to the cloud on demand and eliminate the download entirely; or automating the backup generation, download (with an orchestration system, you can 'download' to the host server and spin up a new database container to restore the backup with a little scripting, which will be much faster than sending the backup across the wire), and restore processes and running them on a timer to set up safe sandbox databases you can use to test. The latter being delayed from realtime does mean that if brand new data reveals a defect you'll still have to wait to update your sandbox, but it may still be worth it.