Given that:
docker
users is ZFS;docker
creates legacy
datasets;Bash:
$ docker ps -a | wc -l
16
$ docker volume ls | wc -l
12
$ zfs list | grep legacy | wc -l
157
16 containers (both running and stopped). 12 volumes. 157 datasets. This seems like an awful lot of legacy datasets. I'm wondering if a lot of them are so orphaned that not even docker
knows about them anymore, so they don't get cleaned up.
There is a huge list of legacy volumes in my Debian zfs pool. They started appearing when I started using Docker on this machine:
$ sudo zfs list | grep legacy | wc -l
486
They are all in the form of:
pool/var/<64-char-hash> 202K 6,18T 818M legacy
This location is used solely by docker.
$ docker info | grep -e Storage -e Dataset
Storage Driver: zfs
Parent Dataset: pool/var
I started cleaning up.
$ docker system prune -a
(...)
$ sudo zfs list | grep legacy | wc -l
154
That's better. However, I'm only running about 15 containers, and after running docker system prune -a
, the history or every container shows that only the last image layer is still available. The rest are <missing>
(because they are cleaned up).
$ docker images | wc -l
15
If all containers use only the last image layer after pruning the rest, shouldn't docker only use 15 image layers and 15 running containers, totalling 30 volumes?
$ sudo zfs list | grep legacy | wc -l
154
Can I find out if they are in use by a container/image? Is there a command that traverses all pool/var/<hash>
datasets in ZFS and figures out to what docker container/image they belong? Either a lot of them can be removed, or I don't understand how to figure out (beyond just trusting docker system prune
) they cannot.
The excessive use of zfs volumes by docker messes up my zfs list
command, both visually and performance-wise. Listing zfs volumes now takes ~10 seconds in stead of <1.
$ docker ps -qa --no-trunc --filter "status=exited"
(no output)
$ docker images --filter "dangling=true" -q --no-trunc
(no output)
$ docker volume ls -qf dangling=true
(no output)
zfs list
example:
NAME USED AVAIL REFER MOUNTPOINT
pool 11,8T 5,81T 128K /pool
pool/var 154G 5,81T 147G /mnt/var
pool/var/0028ab70abecb2e052d1b7ffc4fdccb74546350d33857894e22dcde2ed592c1c 1,43M 5,81T 1,42M legacy
pool/var/0028ab70abecb2e052d1b7ffc4fdccb74546350d33857894e22dcde2ed592c1c@211422332 10,7K - 1,42M -
# and 150 more of the last two with different hashes
I had the same question but couldn't find a satisfactory answer. Adding what I eventually found, since this question is one of the top search results.
The ZFS storage driver for Docker stores each layer of each image as a separate legacy dataset.
Even just a handful of images can result in a huge number of layers, each layer corresponding to a legacy
ZFS dataset.
The base layer of an image is a ZFS filesystem. Each child layer is a ZFS clone based on a ZFS snapshot of the layer below it. A container is a ZFS clone based on a ZFS Snapshot of the top layer of the image it’s created from.
You can check the datasets used by one image by running:
$ docker image inspect [IMAGE_NAME]
Example output:
...
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:f2cb0ecef392f2a630fa1205b874ab2e2aedf96de04d0b8838e4e728e28142da",
...
...
...
"sha256:2e8cc9f5313f9555a4decca744655ed461e21fbe48a0f078ed5f7c4e5292ad2e",
]
},
...
This explains why you can see 150+ datasets created when only running a dozen containers.
Prune and delete unused images.
$ docker image prune -a
To avoid a slow zfs list
, specify the dataset of interest.
Suppose you store docker in tank/docker
and other files in tank/data
. List only the data
datasets by the recursive option:
# recursively list tank/data/*
$ zfs list tank/data -r