dockerdocker-engine

How to link docker images to their composing layers on the disk?


Since Docker v1.10, with the introduction of the content addressable storage, Docker has completely changed the way image data are handled on the disk. I understand that now layers and images are separated. Layers merely become collections of files and directories that have no notion of images and can be freely shared across images. See the update and a blog with better explanation.

During docker push and docker pull, via stdout it can be seen the layers are transported, though the resulting SHA hashes are completely regenerated on the destination.

With a locally built image from ubuntu:14.04 base, when I use the docker history command, I can see a chain of intermediary images used during the build process, and the disk space usage they contributed.

root@ruifeng-VirtualBox:/var/lib/docker/aufs/diff# docker history image_size
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
9ae1f372d83c        11 weeks ago        /bin/sh -c #(nop)  CMD ["/bin/sh" "-c" "/bin/   0 B                 
aaf66e9fa85b        11 weeks ago        /bin/sh -c chown -R martian /home/martian       6.299 MB            
9568768134c1        11 weeks ago        /bin/sh -c rm -rf /home/martian/potatoes        0 B                 
2f40f3f58306        11 weeks ago        /bin/sh -c mv /home/martian/water_tanks /home   6.289 MB            
062e2702ffa2        11 weeks ago        /bin/sh -c mv /home/martian/potatoes /home/ma   5.394 kB            
7b2d8b4c1dd0        11 weeks ago        /bin/sh -c chown -R martian /home/martian       6.299 MB            
8fd47fed98d6        11 weeks ago        /bin/sh -c #(nop) COPY dir:421da6c71a1f252881   6.289 MB            
...

And I can use the docker inspect command to get the underlying layers.

root@ruifeng-VirtualBox:/var/lib/docker/aufs/diff# docker inspect image_size | jq -r '.[].RootFS'
{
  "Layers": [
    "sha256:a85f35566a268e6f4411c5157ffcffe4f5918b068b04d79fdd80003901ca39da",
    "sha256:eaaf7298332642da0f8190fa4b96ad46c04b9c1d1682bc3a35d77bded2b1e0a9",
    "sha256:33a212e8aa5642d3a2ddead146e85912407fc5bbb2a896dab11fcf329177a999",
    "sha256:f1f25d8c6e56dc4891df147a77f57e756873b57f33ce95e6a0acbe47117c0c8a",
    "sha256:67852b7d2cf5f0885293fa9df91ebfd8ef0c42ba11a5155f94806f3a96c5e916",
    "sha256:480d48b7e2864a44c1b2fca0c7e32fbab505f7526ccb25bbfed191c04a9bb7b0",
    "sha256:18d270fe64aa423e0ffdf24faf0103432027da3d5c12f4505e7daedad9fe2195",
    "sha256:a73c3f5eb83790bc6d03381a43a20aef7d0d9d97de0cff4b040e8e4c01a3aee5",
    "sha256:e8d1b67ace73cb92cc00725354e84024153bedae4280149c03fcb52f34d83757",
    "sha256:19a4b80afc677825fec94adf8b6a45a866f42a38675f87f86e50171ff5e0a280",
    "sha256:77d412270fbdd9baba1fe73028b786c3a1709feefa9b03be74b8e9f9ce148635",
    "sha256:2ad21e37389addd577161c981d0c69ab60aa47945172f41f9ec71ada1c1dd4ee",
    "sha256:771d1e47ca8d8dcf55069786e4c499894fba86f704c808413df00f4f980564e1",
    "sha256:f9c02c6fa436213c0f220d49c4ee1b913372081010d4506757ec75d3e788847c"
  ],
  "Type": "layers"
}

My question is, how do I link these layers marked with SHA hashes to the images listed in the IMAGE column of the previous command output? And is there a way to find out the actual location and size of these layers on the disk?

If I am not wrong, the layers should be kept at /var/lib/docker/aufs/diff if the storage driver selection is aufs. But the contents in that folder are named with randomly generated IDs that do not match any of the layer literally. It seems the match is only kept within Docker Engine for security concerns.


Solution

  • Based on the inspiration given by larsks in the answer, I managed to find the location of the layers.

    For example, suppose we want to find the location of the layer contributed by the COPY step, which corresponds to an intermediate image with id 8fd47fed98d6, we can inspect it first.

    root@ruifeng-VirtualBox:/var/lib/docker# docker inspect 8fd47fed98d6 | jq -r '.[].RootFS'
    {
      "Layers": [
        "sha256:a85f35566a268e6f4411c5157ffcffe4f5918b068b04d79fdd80003901ca39da",
        "sha256:eaaf7298332642da0f8190fa4b96ad46c04b9c1d1682bc3a35d77bded2b1e0a9",
        "sha256:33a212e8aa5642d3a2ddead146e85912407fc5bbb2a896dab11fcf329177a999",
        "sha256:f1f25d8c6e56dc4891df147a77f57e756873b57f33ce95e6a0acbe47117c0c8a",
        "sha256:67852b7d2cf5f0885293fa9df91ebfd8ef0c42ba11a5155f94806f3a96c5e916",
        "sha256:480d48b7e2864a44c1b2fca0c7e32fbab505f7526ccb25bbfed191c04a9bb7b0",
        "sha256:18d270fe64aa423e0ffdf24faf0103432027da3d5c12f4505e7daedad9fe2195",
        "sha256:a73c3f5eb83790bc6d03381a43a20aef7d0d9d97de0cff4b040e8e4c01a3aee5",
        "sha256:e8d1b67ace73cb92cc00725354e84024153bedae4280149c03fcb52f34d83757",
        "sha256:19a4b80afc677825fec94adf8b6a45a866f42a38675f87f86e50171ff5e0a280"
      ],
      "Type": "layers"
    }
    

    Now we try to look for the last layer.

    root@ruifeng-VirtualBox:/var/lib/docker# find . -name '*19a4b80afc677825fec94adf8b6a45a866f42a38675f87f86e50171ff5e0a280*'
    root@ruifeng-VirtualBox:/var/lib/docker# 
    

    But there is nothing on the disk. Perhaps there is some reference tree going on there. We can check the file contents in the layerdb.

    root@ruifeng-VirtualBox:/var/lib/docker# grep -rl 19a4b80afc677825fec94adf8b6a45a866f42a38675f87f86e50171ff5e0a280 image/aufs/layerdb/
    image/aufs/layerdb/sha256/f1824ce70e6d1e8f140b9ba637b7447c00d8158d3bbc1f72b491766ab54dd449/diff
    

    We can see that this layer is actually a diff of f1824ce70e6d1e8f140b9ba637b7447c00d8158d3bbc1f72b491766ab54dd449. Let's find it.

    root@ruifeng-VirtualBox:/var/lib/docker# find . -name '*f1824ce70e6d1e8f140b9ba637b7447c00d8158d3bbc1f72b491766ab54dd449*'
    ./image/aufs/layerdb/sha256/f1824ce70e6d1e8f140b9ba637b7447c00d8158d3bbc1f72b491766ab54dd449 
    

    And find the cache-id that will direct us into the actual location in the aufs/diff folder.

    root@ruifeng-VirtualBox:/var/lib/docker# cat image/aufs/layerdb/sha256/f1824ce70e6d1e8f140b9ba637b7447c00d8158d3bbc1f72b491766ab54dd449/cache-id 
    c097799b7946231fb60511b442c10cd0b56ee17a12b376149f305adda67e7637
    

    Let's go into the location and check.

    root@ruifeng-VirtualBox:/var/lib/docker# cd aufs/diff/c097799b7946231fb60511b442c10cd0b56ee17a12b376149f305adda67e7637
    root@ruifeng-VirtualBox:/var/lib/docker/aufs/diff/c097799b7946231fb60511b442c10cd0b56ee17a12b376149f305adda67e7637# find .
    .
    ./home
    ./home/martian
    ./home/martian/water_tanks
    ./home/martian/water_tanks/IMG_0052.JPG
    root@ruifeng-VirtualBox:/var/lib/docker/aufs/diff/c097799b7946231fb60511b442c10cd0b56ee17a12b376149f305adda67e7637#
    

    It contains all files and directories that were intended to be copied into the image by the COPY step. The size of the layer can be checked as well.

    root@ruifeng-VirtualBox:/var/lib/docker# du -sh aufs/diff/c097799b7946231fb60511b442c10cd0b56ee17a12b376149f305adda67e7637
    6.1M    aufs/diff/c097799b7946231fb60511b442c10cd0b56ee17a12b376149f305adda67e7637
    

    This will provide quite some insight into the Union File System and the Copy-on-Write mechanism used by Docker, if subsequent layers are also inspected in the same manner.

    This can also be done in a reverse order. We can look for a file or directory that is intended to be inside the image, which should be somewhere inside aufs/diff, and then use the cache-id to trace back to the layers.

    root@ruifeng-VirtualBox:/var/lib/docker# find . -name '*water_tanks*'
    ./aufs/diff/c097799b7946231fb60511b442c10cd0b56ee17a12b376149f305adda67e7637/home/martian/water_tanks