dockerdocker-volumedocker-container

How to deal with persistent storage (e.g. databases) in Docker


How do people deal with persistent storage for your Docker containers?

I am currently using this approach: build the image, e.g. for PostgreSQL, and then start the container with

docker run --volumes-from c0dbc34fd631 -d app_name/postgres

IMHO, that has the drawback, that I must not ever (by accident) delete container "c0dbc34fd631".

Another idea would be to mount host volumes "-v" into the container, however, the userid within the container does not necessarily match the userid from the host, and then permissions might be messed up.

Note: Instead of --volumes-from 'cryptic_id' you can also use --volumes-from my-data-container where my-data-container is a name you assigned to a data-only container, e.g. docker run --name my-data-container ... (see the accepted answer)


Solution

  • Docker 1.9.0 and above

    Use volume API

    docker volume create --name hello
    docker run -d -v hello:/container/path/for/volume container_image my_command
    

    This means that the data-only container pattern must be abandoned in favour of the new volumes.

    Actually the volume API is only a better way to achieve what was the data-container pattern.

    If you create a container with a -v volume_name:/container/fs/path Docker will automatically create a named volume for you that can:

    1. Be listed through the docker volume ls
    2. Be identified through the docker volume inspect volume_name
    3. Backed up as a normal directory
    4. Backed up as before through a --volumes-from connection

    The new volume API adds a useful command that lets you identify dangling volumes:

    docker volume ls -f dangling=true
    

    And then remove it through its name:

    docker volume rm <volume name>
    

    As @mpugach underlines in the comments, you can get rid of all the dangling volumes with a nice one-liner:

    docker volume rm $(docker volume ls -f dangling=true -q)
    # Or using 1.13.x
    docker volume prune
    

    Docker 1.8.x and below

    The approach that seems to work best for production is to use a data only container.

    The data only container is run on a barebones image and actually does nothing except exposing a data volume.

    Then you can run any other container to have access to the data container volumes:

    docker run --volumes-from data-container some-other-container command-to-execute
    

    In this blog post there is a good description of the so-called container as volume pattern which clarifies the main point of having data only containers.

    Docker documentation has now the DEFINITIVE description of the container as volume/s pattern.

    Following is the backup/restore procedure for Docker 1.8.x and below.

    BACKUP:

    sudo docker run --rm --volumes-from DATA -v $(pwd):/backup busybox tar cvf /backup/backup.tar /data
    

    RESTORE:

    # Create a new data container
    $ sudo docker run -v /data -name DATA2 busybox true
    # untar the backup files into the new container᾿s data volume
    $ sudo docker run --rm --volumes-from DATA2 -v $(pwd):/backup busybox tar xvf /backup/backup.tar
    data/
    data/sven.txt
    # Compare to the original container
    $ sudo docker run --rm --volumes-from DATA -v `pwd`:/backup busybox ls /data
    sven.txt
    

    Here is a nice article from the excellent Brian Goff explaining why it is good to use the same image for a container and a data container.