I have a Ceph cluster intended to run as CephFS on hard drive enclosures that provide 9PiB raw space total over a number of servers.
I created a 3+3 erasure coding pool that is supposed to span the whole raw space of my hard drives.
Surprisingly, it seems to occupy only 6PiB out of 9PiB available, so that when I've written ~2.5PiB data into it (and ~2.5PiB more checksums), it says that I have only 500TiB space available (corresponding to 1PiB raw space).
Here's the output of ceph df
:
$ sudo ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 8.9 PiB 3.7 PiB 5.2 PiB 5.2 PiB 58.62
ssd 35 TiB 15 TiB 20 TiB 20 TiB 57.96
TOTAL 9.0 PiB 3.7 PiB 5.2 PiB 5.3 PiB 58.62
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
cephfs_metadata 7 5.1 GiB 1.55M 5.7 GiB 0.15 780 GiB
cephfs_erdata 8 2.5 PiB 687.98M 5.2 PiB 84.29 500 TiB
Note the MAX AVAIL
column in POOLS
section for pool cephfs_erdata states that only 500TiB is left, while AVAIL
column in RAW STORAGE
hdd
CLASS
has 3.7PiB available.
What does that mean? Can I allocate more space for that pool? Why didn't Ceph itself allocate all the space available for it?
We found out the causes of this problem.
Due to a mis-configuration our CephFS was using ssd drives not only for storing metadata, but the actual data as well. CephFS runs out of space whenever one of the OSDs runs out of space and it can't place any more data on it. So the SSDs were the bottleneck for MAX_AVAIL
.
Even hdd Ceph OSDs were not evenly loaded. So we had to run reweight. After that the data were distributed evenly and MAX_AVAIL
size approached AVAIL
.