dockerembeddedhardwarevolumebusybox

How to create a Docker named volume and populate it with default files?


I am working on a flask application that runs on some embedded hardware inside a Docker container and I am trying to store persistent data in a named volume that exists on my host machine during the image build. I have my docker-compose.yml set up to create and mount the volume when the container is started, however the volume is empty when it is created. I have several files on my host at /src/data that need to populate the app/src/data folder on my embedded hardware after the volume is mounted.

I have found of lot of places talking about copying files into volumes, but I can't seem to find the information I need to make this happen from an image.

I had read somewhere about potentially using a busybox container to copy the data into the volume prior to starting my container, due to the volume being set up after the data was placed in the folder causing it to be obscured as long as the volume existed, but the init/busybox method has not worked for me.

Here is my docker-compose.yml:

version: "3.9"
services:
  init:
    image: busybox
    volumes:
      - ./src/data:/src/data:ro
      - appdata:/dest/data
    command: sh -c "cp -r /src/data/* /dest/data/"

  myapp-debug:
    build:
      context: .
      dockerfile: Dockerfile.debug
    volumes:
      - appdata:/app/src/data
    image: ${LOCAL_REGISTRY}:5002/myapp-debug:${TAG}
    ports:
      - 6502:6502
      - 6512:6512
      - 5000:5000
      - 502:502
    depends_on:
      - init

  myapp:
    build:
      context: .
      dockerfile: Dockerfile
    volumes:
      - appdata:/app/src/data
    image: ${DOCKER_LOGIN}/myapp:${TAG}
    ports:
      - 5000:5000
      - 502:502
    restart: always
    depends_on:
      - init

volumes:
  appdata:

It seems like this should be a relatively easy task to do, but I am not having any luck with it. Copying the files manually is not an option as this same image will be used in production programming over many duplicates of the same device.


Solution

  • You clarify in comments:

    ...default [...] files that will be the same by default on every [initial container run]. [...] persistent so that if settings are changed [...] they are not lost.

    I'd address this by having a copy of the default files in your image. However, when you run the container, the persistent storage will hide whatever's in the container on that mount path, so you need to store the default files somewhere else, and set up the container to copy them if needed.

    So in the Dockerfile you might

    COPY src/data/ src/default-data/
    
    RUN mkdir src/data/
    RUN chown appuser src/data/
    # VOLUME /app/src/data
    

    to make sure the initial data is in the image, and that an empty data directory exists.

    You'll need a small script to copy the data into the directory if it's not already there

    #!/bin/sh
    
    if [ ! -f /app/src/data/somefile.txt ]; then
      cp /app/src/default-data/* /app/src/data
    fi
    
    exec "$@"
    

    If you set this script as the Dockerfile ENTRYPOINT, then the last line will run the Dockerfile CMD (or anything that might be overridden with for example a docker compose run command).

    COPY entrypoint.sh .
    ENTRYPOINT ["./entrypoint.sh"]
    CMD whatever you had before
    

    Now the container copies the data itself, so you don't need the init container any more, and you can mount any kind of storage you want on that directory.

    version: '3.8'
    services:
      myapp:
        build: .
        volumes:
          - appdata:/app/src/data
        image: ${DOCKER_LOGIN:-me}/myapp:${TAG:-latest}
        ports:
          - 5000:5000
          - 502:502
        restart: always
    volumes:
      appdata:
    

    If you want some other behavior if the default files change in the image, you can write whatever logic you need in the entrypoint script.

    (Docker has a default behavior similar to this specifically for named volumes. I'd avoid using it, though: it doesn't work with other storage mechanisms like Docker bind mounts or Kubernetes persistent volumes and you can't do anything if it's possible to update the volume from new content in the image.)