cephcephfs

CephFS pool can't use all the available raw space (MAX_AVAIL < AVAIL)


I have a Ceph cluster intended to run as CephFS on hard drive enclosures that provide 9PiB raw space total over a number of servers.

I created a 3+3 erasure coding pool that is supposed to span the whole raw space of my hard drives.

Surprisingly, it seems to occupy only 6PiB out of 9PiB available, so that when I've written ~2.5PiB data into it (and ~2.5PiB more checksums), it says that I have only 500TiB space available (corresponding to 1PiB raw space).

Here's the output of ceph df:

$ sudo ceph df
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED 
    hdd       8.9 PiB     3.7 PiB     5.2 PiB      5.2 PiB         58.62 
    ssd        35 TiB      15 TiB      20 TiB       20 TiB         57.96 
    TOTAL     9.0 PiB     3.7 PiB     5.2 PiB      5.3 PiB         58.62 
 
POOLS:
    POOL                ID     STORED      OBJECTS     USED        %USED     MAX AVAIL 
    cephfs_metadata      7     5.1 GiB       1.55M     5.7 GiB      0.15       780 GiB 
    cephfs_erdata        8     2.5 PiB     687.98M     5.2 PiB     84.29       500 TiB

Note the MAX AVAIL column in POOLS section for pool cephfs_erdata states that only 500TiB is left, while AVAIL column in RAW STORAGE hdd CLASS has 3.7PiB available.

What does that mean? Can I allocate more space for that pool? Why didn't Ceph itself allocate all the space available for it?


Solution

  • We found out the causes of this problem.

    1. Due to a mis-configuration our CephFS was using ssd drives not only for storing metadata, but the actual data as well. CephFS runs out of space whenever one of the OSDs runs out of space and it can't place any more data on it. So the SSDs were the bottleneck for MAX_AVAIL.

    2. Even hdd Ceph OSDs were not evenly loaded. So we had to run reweight. After that the data were distributed evenly and MAX_AVAIL size approached AVAIL.