kubernetes storage ceph kubernetes-rook crush

Rookv1.2 does not add labels on CrushMap

I'm currently working with Rook v1.2.2 to create a Ceph Cluster on my Kubernetes Cluster (v1.16.3) and I'm failing to add a rack level on my CrushMap.

I want to go from :

ID CLASS WEIGHT  TYPE NAME
-1       0.02737 root default
-3       0.01369     host test-w1
 0   hdd 0.01369         osd.0
-5       0.01369     host test-w2
 1   hdd 0.01369         osd.1

to something like :

ID CLASS WEIGHT  TYPE NAME                 STATUS REWEIGHT PRI-AFF
-1       0.01358 root default
-5       0.01358     zone zone1
-4       0.01358         rack rack1
-3       0.01358             host mynode
 0   hdd 0.00679                 osd.0         up  1.00000 1.00000
 1   hdd 0.00679                 osd.1         up  1.00000 1.00000

Like explained in the official rook doc (https://rook.io/docs/rook/v1.2/ceph-cluster-crd.html#osd-topology).

Steps I followed :

I have a v1.16.3 Kubernetes Cluster with 1 Master (test-m1) and two workers (test-w1 and test-w2). I installed this cluster using the default configuration of Kubespray (https://kubespray.io/#/docs/getting-started).

I labeled my node with :

kubectl label node test-w1 topology.rook.io/rack=rack1
kubectl label node test-w2 topology.rook.io/rack=rack2

I added the label role=storage-node and taint storage-node=true:NoSchedule to force Rook to execute on specific storage nodes, here are the full exemple of labels and taints for one storage node :

Name:               test-w1
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=test-w1
                    kubernetes.io/os=linux
                    role=storage-node
                    topology.rook.io/rack=rack1
Annotations:        csi.volume.kubernetes.io/nodeid: {"rook-ceph.cephfs.csi.ceph.com":"test-w1","rook-ceph.rbd.csi.ceph.com":"test-w1"}
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 29 Jan 2020 03:38:52 +0100
Taints:             storage-node=true:NoSchedule

I started to deploy the common.yml of Rook : https://github.com/rook/rook/blob/master/cluster/examples/kubernetes/ceph/common.yaml

The I applied a custom operator.yml file to be able to run the operator, csi-plugin and agent on nodes labelled "role=storage-node" :

#################################################################################################################
# The deployment for the rook operator
# Contains the common settings for most Kubernetes deployments.
# For example, to create the rook-ceph cluster:
#   kubectl create -f common.yaml
#   kubectl create -f operator.yaml
#   kubectl create -f cluster.yaml
#
# Also see other operator sample files for variations of operator.yaml:
# - operator-openshift.yaml: Common settings for running in OpenShift
#################################################################################################################
# OLM: BEGIN OPERATOR DEPLOYMENT
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rook-ceph-operator
  namespace: rook-ceph
  labels:
    operator: rook
    storage-backend: ceph
spec:
  selector:
    matchLabels:
      app: rook-ceph-operator
  replicas: 1
  template:
    metadata:
      labels:
        app: rook-ceph-operator
    spec:
      serviceAccountName: rook-ceph-system
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: role
                operator: In
                values:
                - storage-node
      tolerations:
      - key: "storage-node"
        operator: "Exists"
        effect: "NoSchedule"
      containers:
      - name: rook-ceph-operator
        image: rook/ceph:v1.2.2
        args: ["ceph", "operator"]
        volumeMounts:
        - mountPath: /var/lib/rook
          name: rook-config
        - mountPath: /etc/ceph
          name: default-config-dir
        env:
        # If the operator should only watch for cluster CRDs in the same namespace, set this to "true".
        # If this is not set to true, the operator will watch for cluster CRDs in all namespaces.
        - name: ROOK_CURRENT_NAMESPACE_ONLY
          value: "false"
        # To disable RBAC, uncomment the following:
        # - name: RBAC_ENABLED
        #   value: "false"
        # Rook Agent toleration. Will tolerate all taints with all keys.
        # Choose between NoSchedule, PreferNoSchedule and NoExecute:
        # - name: AGENT_TOLERATION
        #   value: "NoSchedule"
        # (Optional) Rook Agent toleration key. Set this to the key of the taint you want to tolerate
        # - name: AGENT_TOLERATION_KEY
        #  value: "storage-node"
        # (Optional) Rook Agent tolerations list. Put here list of taints you want to tolerate in YAML format.
        - name: AGENT_TOLERATIONS
          value: |
            - effect: NoSchedule
              key: storage-class
              operator: Exists
        # (Optional) Rook Agent priority class name to set on the pod(s)
        # - name: AGENT_PRIORITY_CLASS_NAME
        #   value: "<PriorityClassName>"
        # (Optional) Rook Agent NodeAffinity.
        - name: AGENT_NODE_AFFINITY
          value: "role=storage-node"
        # (Optional) Rook Agent mount security mode. Can by `Any` or `Restricted`.
        # `Any` uses Ceph admin credentials by default/fallback.
        # For using `Restricted` you must have a Ceph secret in each namespace storage should be consumed from and
        # set `mountUser` to the Ceph user, `mountSecret` to the Kubernetes secret name.
        # to the namespace in which the `mountSecret` Kubernetes secret namespace.
        # - name: AGENT_MOUNT_SECURITY_MODE
        #   value: "Any"
        # Set the path where the Rook agent can find the flex volumes
        # - name: FLEXVOLUME_DIR_PATH
        #   value: "<PathToFlexVolumes>"
        # Set the path where kernel modules can be found
        # - name: LIB_MODULES_DIR_PATH
        #   value: "<PathToLibModules>"
        # Mount any extra directories into the agent container
        # - name: AGENT_MOUNTS
        #   value: "somemount=/host/path:/container/path,someothermount=/host/path2:/container/path2"
        # Rook Discover toleration. Will tolerate all taints with all keys.
        # Choose between NoSchedule, PreferNoSchedule and NoExecute:
        # - name: DISCOVER_TOLERATION
        #   value: "NoSchedule"
        # (Optional) Rook Discover toleration key. Set this to the key of the taint you want to tolerate
        # - name: DISCOVER_TOLERATION_KEY
        #   value: "storage-node"
        # (Optional) Rook Discover tolerations list. Put here list of taints you want to tolerate in YAML format.
        - name: DISCOVER_TOLERATIONS
          value: |
            - effect: NoSchedule
              key: storage-node
              operator: Exists
        # (Optional) Rook Discover priority class name to set on the pod(s)
        # - name: DISCOVER_PRIORITY_CLASS_NAME
        #   value: "<PriorityClassName>"
        # (Optional) Discover Agent NodeAffinity.
        - name: DISCOVER_AGENT_NODE_AFFINITY
          value: "role=storage-node"
        # Allow rook to create multiple file systems. Note: This is considered
        # an experimental feature in Ceph as described at
        # http://docs.ceph.com/docs/master/cephfs/experimental-features/#multiple-filesystems-within-a-ceph-cluster
        # which might cause mons to crash as seen in https://github.com/rook/rook/issues/1027
        - name: ROOK_ALLOW_MULTIPLE_FILESYSTEMS
          value: "false"

        # The logging level for the operator: INFO | DEBUG
        - name: ROOK_LOG_LEVEL
          value: "INFO"

        # The interval to check the health of the ceph cluster and update the status in the custom resource.
        - name: ROOK_CEPH_STATUS_CHECK_INTERVAL
          value: "60s"

        # The interval to check if every mon is in the quorum.
        - name: ROOK_MON_HEALTHCHECK_INTERVAL
          value: "45s"

        # The duration to wait before trying to failover or remove/replace the
        # current mon with a new mon (useful for compensating flapping network).
        - name: ROOK_MON_OUT_TIMEOUT
          value: "600s"

        # The duration between discovering devices in the rook-discover daemonset.
        - name: ROOK_DISCOVER_DEVICES_INTERVAL
          value: "60m"

        # Whether to start pods as privileged that mount a host path, which includes the Ceph mon and osd pods.
        # This is necessary to workaround the anyuid issues when running on OpenShift.
        # For more details see https://github.com/rook/rook/issues/1314#issuecomment-355799641
        - name: ROOK_HOSTPATH_REQUIRES_PRIVILEGED
          value: "false"

        # In some situations SELinux relabelling breaks (times out) on large filesystems, and doesn't work with cephfs ReadWriteMany volumes (last relabel wins).
        # Disable it here if you have similar issues.
        # For more details see https://github.com/rook/rook/issues/2417
        - name: ROOK_ENABLE_SELINUX_RELABELING
          value: "true"

        # In large volumes it will take some time to chown all the files. Disable it here if you have performance issues.
        # For more details see https://github.com/rook/rook/issues/2254
        - name: ROOK_ENABLE_FSGROUP
          value: "true"

        # Disable automatic orchestration when new devices are discovered
        - name: ROOK_DISABLE_DEVICE_HOTPLUG
          value: "false"

        # Provide customised regex as the values using comma. For eg. regex for rbd based volume, value will be like "(?i)rbd[0-9]+".
        # In case of more than one regex, use comma to seperate between them.
        # Default regex will be "(?i)dm-[0-9]+,(?i)rbd[0-9]+,(?i)nbd[0-9]+"
        # Add regex expression after putting a comma to blacklist a disk
        # If value is empty, the default regex will be used.
        - name: DISCOVER_DAEMON_UDEV_BLACKLIST
          value: "(?i)dm-[0-9]+,(?i)rbd[0-9]+,(?i)nbd[0-9]+"

        # Whether to enable the flex driver. By default it is enabled and is fully supported, but will be deprecated in some future release
        # in favor of the CSI driver.
        - name: ROOK_ENABLE_FLEX_DRIVER
          value: "false"

        # Whether to start the discovery daemon to watch for raw storage devices on nodes in the cluster.
        # This daemon does not need to run if you are only going to create your OSDs based on StorageClassDeviceSets with PVCs.
        - name: ROOK_ENABLE_DISCOVERY_DAEMON
          value: "true"

        # Enable the default version of the CSI CephFS driver. To start another version of the CSI driver, see image properties below.
        - name: ROOK_CSI_ENABLE_CEPHFS
          value: "true"

        # Enable the default version of the CSI RBD driver. To start another version of the CSI driver, see image properties below.
        - name: ROOK_CSI_ENABLE_RBD
          value: "true"
        - name: ROOK_CSI_ENABLE_GRPC_METRICS
          value: "true"
        # Enable deployment of snapshotter container in ceph-csi provisioner.
        - name: CSI_ENABLE_SNAPSHOTTER
          value: "true"
        # Enable Ceph Kernel clients on kernel < 4.17 which support quotas for Cephfs
        # If you disable the kernel client, your application may be disrupted during upgrade.
        # See the upgrade guide: https://rook.io/docs/rook/v1.2/ceph-upgrade.html
        - name: CSI_FORCE_CEPHFS_KERNEL_CLIENT
          value: "true"
        # CSI CephFS plugin daemonset update strategy, supported values are OnDelete and RollingUpdate.
        # Default value is RollingUpdate.
        #- name: CSI_CEPHFS_PLUGIN_UPDATE_STRATEGY
        #  value: "OnDelete"
        # CSI Rbd plugin daemonset update strategy, supported values are OnDelete and RollingUpdate.
        # Default value is RollingUpdate.
        #- name: CSI_RBD_PLUGIN_UPDATE_STRATEGY
        #  value: "OnDelete"
        # The default version of CSI supported by Rook will be started. To change the version
        # of the CSI driver to something other than what is officially supported, change
        # these images to the desired release of the CSI driver.
        #- name: ROOK_CSI_CEPH_IMAGE
        #  value: "quay.io/cephcsi/cephcsi:v1.2.2"
        #- name: ROOK_CSI_REGISTRAR_IMAGE
        #  value: "quay.io/k8scsi/csi-node-driver-registrar:v1.1.0"
        #- name: ROOK_CSI_PROVISIONER_IMAGE
        #  value: "quay.io/k8scsi/csi-provisioner:v1.4.0"
        #- name: ROOK_CSI_SNAPSHOTTER_IMAGE
        #  value: "quay.io/k8scsi/csi-snapshotter:v1.2.2"
        #- name: ROOK_CSI_ATTACHER_IMAGE
        #  value: "quay.io/k8scsi/csi-attacher:v1.2.0"
        # kubelet directory path, if kubelet configured to use other than /var/lib/kubelet path.
        #- name: ROOK_CSI_KUBELET_DIR_PATH
        #  value: "/var/lib/kubelet"
        # (Optional) Ceph Provisioner NodeAffinity.
        - name: CSI_PROVISIONER_NODE_AFFINITY
          value: "role=storage-node"
        # (Optional) CEPH CSI provisioner tolerations list. Put here list of taints you want to tolerate in YAML format.
        #  CSI provisioner would be best to start on the same nodes as other ceph daemons.
        - name: CSI_PROVISIONER_TOLERATIONS
          value: |
            - effect: NoSchedule
              key: storage-node
              operator: Exists
        # (Optional) Ceph CSI plugin NodeAffinity.
        - name: CSI_PLUGIN_NODE_AFFINITY
          value: "role=storage-node"
        # (Optional) CEPH CSI plugin tolerations list. Put here list of taints you want to tolerate in YAML format.
        # CSI plugins need to be started on all the nodes where the clients need to mount the storage.
        - name: CSI_PLUGIN_TOLERATIONS
          value: |
            - effect: NoSchedule
              key: storage-node
              operator: Exists
        # Configure CSI cephfs grpc and liveness metrics port
        #- name: CSI_CEPHFS_GRPC_METRICS_PORT
        #  value: "9091"
        #- name: CSI_CEPHFS_LIVENESS_METRICS_PORT
        #  value: "9081"
        # Configure CSI rbd grpc and liveness metrics port
        #- name: CSI_RBD_GRPC_METRICS_PORT
        #  value: "9090"
        #- name: CSI_RBD_LIVENESS_METRICS_PORT
        #  value: "9080"

        # Time to wait until the node controller will move Rook pods to other
        # nodes after detecting an unreachable node.
        # Pods affected by this setting are:
        # mgr, rbd, mds, rgw, nfs, PVC based mons and osds, and ceph toolbox
        # The value used in this variable replaces the default value of 300 secs
        # added automatically by k8s as Toleration for
        # <node.kubernetes.io/unreachable>
        # The total amount of time to reschedule Rook pods in healthy nodes
        # before detecting a <not ready node> condition will be the sum of:
        #  --> node-monitor-grace-period: 40 seconds (k8s kube-controller-manager flag)
        #  --> ROOK_UNREACHABLE_NODE_TOLERATION_SECONDS: 5 seconds
        - name: ROOK_UNREACHABLE_NODE_TOLERATION_SECONDS
          value: "5"

        # The name of the node to pass with the downward API
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        # The pod name to pass with the downward API
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        # The pod namespace to pass with the downward API
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
      # Uncomment it to run rook operator on the host network
      #hostNetwork: true
      volumes:
      - name: rook-config
        emptyDir: {}
      - name: default-config-dir
        emptyDir: {}
# OLM: END OPERATOR DEPLOYMENT

Then I applied my own custom ceph-cluster.yml file to allow the pods to run on nodes labelled "role=storage-node"

#################################################################################################################
# Define the settings for the rook-ceph cluster with settings that should only be used in a test environment.
# A single filestore OSD will be created in the dataDirHostPath.
# For example, to create the cluster:
#   kubectl create -f common.yaml
#   kubectl create -f operator.yaml
#   kubectl create -f ceph-cluster.yaml
#################################################################################################################

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    image: ceph/ceph:v14.2.5
    allowUnsupported: true
  dataDirHostPath: /var/lib/rook
  skipUpgradeChecks: false
  mon:
    count: 1 
    allowMultiplePerNode: true
  dashboard:
    enabled: true
    ssl: true
  monitoring:
    enabled: false  # requires Prometheus to be pre-installed
    rulesNamespace: rook-ceph
  network:
    hostNetwork: false
  rbdMirroring:
    workers: 0
  placement:
    all:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: role
              operator: In
              values:
              - storage-node
      tolerations:
      - key: "storage-node"
        operator: "Exists"
        effect: "NoSchedule"
  mgr:
    modules:
    # the pg_autoscaler is only available on nautilus or newer. remove this if testing mimic.
    - name: pg_autoscaler
      enabled: true
  storage:
    useAllNodes: false
    useAllDevices: false
    nodes:
    - name: "test-w1"
      directories:
      - path: /var/lib/rook
    - name: "test-w2"
      directories:
      - path: /var/lib/rook

With this configuration, Rook does not apply the labels on the Crush Map. If I install the toolbox.yml (https://rook.io/docs/rook/v1.2/ceph-toolbox.html), go into it and run

ceph osd tree
ceph osd crush tree

I have the following output :

ID CLASS WEIGHT  TYPE NAME
-1       0.02737 root default
-3       0.01369     host test-w1
 0   hdd 0.01369         osd.0
-5       0.01369     host test-w2
 1   hdd 0.01369         osd.1

As you can see, no rack is defined. Even if I labelled my nodes correctly.

What is surprising is that the pods prepare-osd can retrieve the information on the first line of the following logs :

$ kubectl logs rook-ceph-osd-prepare-test-w1-7cp4f -n rook-ceph

2020-01-29 09:59:07.272649 I | cephcmd: crush location of osd: root=default host=test-w1 rack=rack1
[couppayy@test-m1 test_local]$ cat preposd.txt
2020-01-29 09:59:07.155656 I | cephcmd: desired devices to configure osds: [{Name: OSDsPerDevice:1 MetadataDevice: DatabaseSizeMB:0 DeviceClass: IsFilter:false IsDevicePathFilter:false}]
2020-01-29 09:59:07.185024 I | rookcmd: starting Rook v1.2.2 with arguments '/rook/rook ceph osd provision'
2020-01-29 09:59:07.185069 I | rookcmd: flag values: --cluster-id=c9ee638a-1d02-4ad9-95c9-cb796f61623a, --data-device-filter=, --data-device-path-filter=, --data-devices=, --data-directories=/var/lib/rook, --encrypted-device=false, --force-format=false, --help=false, --location=, --log-flush-frequency=5s, --log-level=INFO, --metadata-device=, --node-name=test-w1, --operator-image=, --osd-database-size=0, --osd-journal-size=5120, --osd-store=, --osd-wal-size=576, --osds-per-device=1, --pvc-backed-osd=false, --service-account=
2020-01-29 09:59:07.185108 I | op-mon: parsing mon endpoints: a=10.233.35.212:6789
2020-01-29 09:59:07.272603 I | op-osd: CRUSH location=root=default host=test-w1 rack=rack1
2020-01-29 09:59:07.272649 I | cephcmd: crush location of osd: root=default host=test-w1 rack=rack1
2020-01-29 09:59:07.313099 I | cephconfig: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2020-01-29 09:59:07.313397 I | cephconfig: generated admin config in /var/lib/rook/rook-ceph
2020-01-29 09:59:07.322175 I | cephosd: discovering hardware
2020-01-29 09:59:07.322228 I | exec: Running command: lsblk --all --noheadings --list --output KNAME
2020-01-29 09:59:07.365036 I | exec: Running command: lsblk /dev/sda --bytes --nodeps --pairs --output SIZE,ROTA,RO,TYPE,PKNAME
2020-01-29 09:59:07.416812 W | inventory: skipping device sda: Failed to complete 'lsblk /dev/sda': exit status 1. lsblk: /dev/sda: not a block device
2020-01-29 09:59:07.416873 I | exec: Running command: lsblk /dev/sda1 --bytes --nodeps --pairs --output SIZE,ROTA,RO,TYPE,PKNAME
2020-01-29 09:59:07.450851 W | inventory: skipping device sda1: Failed to complete 'lsblk /dev/sda1': exit status 1. lsblk: /dev/sda1: not a block device
2020-01-29 09:59:07.450892 I | exec: Running command: lsblk /dev/sda2 --bytes --nodeps --pairs --output SIZE,ROTA,RO,TYPE,PKNAME
2020-01-29 09:59:07.457890 W | inventory: skipping device sda2: Failed to complete 'lsblk /dev/sda2': exit status 1. lsblk: /dev/sda2: not a block device
2020-01-29 09:59:07.457934 I | exec: Running command: lsblk /dev/sr0 --bytes --nodeps --pairs --output SIZE,ROTA,RO,TYPE,PKNAME
2020-01-29 09:59:07.503758 W | inventory: skipping device sr0: Failed to complete 'lsblk /dev/sr0': exit status 1. lsblk: /dev/sr0: not a block device
2020-01-29 09:59:07.503793 I | cephosd: creating and starting the osds
2020-01-29 09:59:07.543504 I | cephosd: configuring osd devices: {"Entries":{}}
2020-01-29 09:59:07.543554 I | exec: Running command: ceph-volume lvm batch --prepare
2020-01-29 09:59:08.906271 I | cephosd: no more devices to configure
2020-01-29 09:59:08.906311 I | exec: Running command: ceph-volume lvm list  --format json
2020-01-29 09:59:10.841568 I | cephosd: 0 ceph-volume osd devices configured on this node
2020-01-29 09:59:10.841595 I | cephosd: devices = []
2020-01-29 09:59:10.847396 I | cephosd: configuring osd dirs: map[/var/lib/rook:-1]
2020-01-29 09:59:10.848011 I | exec: Running command: ceph osd create 652071c9-2cdb-4df9-a20e-813738c4e3f6 --connect-timeout=15 --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json --out-file /tmp/851021116
2020-01-29 09:59:14.213679 I | cephosd: successfully created OSD 652071c9-2cdb-4df9-a20e-813738c4e3f6 with ID 0
2020-01-29 09:59:14.213744 I | cephosd: osd.0 appears to be new, cleaning the root dir at /var/lib/rook/osd0
2020-01-29 09:59:14.214417 I | cephconfig: writing config file /var/lib/rook/osd0/rook-ceph.config
2020-01-29 09:59:14.214653 I | exec: Running command: ceph auth get-or-create osd.0 -o /var/lib/rook/osd0/keyring osd allow * mon allow profile osd --connect-timeout=15 --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format plain
2020-01-29 09:59:17.189996 I | cephosd: Initializing OSD 0 file system at /var/lib/rook/osd0...
2020-01-29 09:59:17.194681 I | exec: Running command: ceph mon getmap --connect-timeout=15 --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json --out-file /tmp/298283883
2020-01-29 09:59:20.936868 I | exec: got monmap epoch 1
2020-01-29 09:59:20.937380 I | exec: Running command: ceph-osd --mkfs --id=0 --cluster=rook-ceph --conf=/var/lib/rook/osd0/rook-ceph.config --osd-data=/var/lib/rook/osd0 --osd-uuid=652071c9-2cdb-4df9-a20e-813738c4e3f6 --monmap=/var/lib/rook/osd0/tmp/activate.monmap --keyring=/var/lib/rook/osd0/keyring --osd-journal=/var/lib/rook/osd0/journal
2020-01-29 09:59:21.324912 I | mkfs-osd0: 2020-01-29 09:59:21.323 7fc7e2a8ea80 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to
force use of aio anyway
2020-01-29 09:59:21.386136 I | mkfs-osd0: 2020-01-29 09:59:21.384 7fc7e2a8ea80 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to
force use of aio anyway
2020-01-29 09:59:21.387553 I | mkfs-osd0: 2020-01-29 09:59:21.384 7fc7e2a8ea80 -1 journal do_read_entry(4096): bad header magic
2020-01-29 09:59:21.387585 I | mkfs-osd0: 2020-01-29 09:59:21.384 7fc7e2a8ea80 -1 journal do_read_entry(4096): bad header magic
2020-01-29 09:59:21.450639 I | cephosd: Config file /var/lib/rook/osd0/rook-ceph.config:
[global]
fsid                         = a19423a1-f135-446f-b4d9-f52da10a935f
mon initial members          = a
mon host                     = v1:10.233.35.212:6789
public addr                  = 10.233.95.101
cluster addr                 = 10.233.95.101
mon keyvaluedb               = rocksdb
mon_allow_pool_delete        = true
mon_max_pg_per_osd           = 1000
debug default                = 0
debug rados                  = 0
debug mon                    = 0
debug osd                    = 0
debug bluestore              = 0
debug filestore              = 0
debug journal                = 0
debug leveldb                = 0
filestore_omap_backend       = rocksdb
osd pg bits                  = 11
osd pgp bits                 = 11
osd pool default size        = 1
osd pool default pg num      = 100
osd pool default pgp num     = 100
osd max object name len      = 256
osd max object namespace len = 64
osd objectstore              = filestore
rbd_default_features         = 3
fatal signal handlers        = false

[osd.0]
keyring          = /var/lib/rook/osd0/keyring
osd journal size = 5120

2020-01-29 09:59:21.450723 I | cephosd: completed preparing osd &{ID:0 DataPath:/var/lib/rook/osd0 Config:/var/lib/rook/osd0/rook-ceph.config Cluster:rook-ceph KeyringPath:/var/lib/rook/osd0/keyring UUID:652071c9-2cdb-4df9-a20e-813738c4e3f6 Journal:/var/lib/rook/osd0/journal IsFileStore:true IsDirectory:true DevicePartUUID: CephVolumeInitiated:false LVPath: SkipLVRelease:false Location: LVBackedPV:false}
2020-01-29 09:59:21.450743 I | cephosd: 1/1 osd dirs succeeded on this node
2020-01-29 09:59:21.450755 I | cephosd: saving osd dir map
2020-01-29 09:59:21.479301 I | cephosd: device osds:[]
dir osds: [{ID:0 DataPath:/var/lib/rook/osd0 Config:/var/lib/rook/osd0/rook-ceph.config Cluster:rook-ceph KeyringPath:/var/lib/rook/osd0/keyring UUID:652071c9-2cdb-4df9-a20e-813738c4e3f6 Journal:/var/lib/rook/osd0/journal IsFileStore:true IsDirectory:true DevicePartUUID: CephVolumeInitiated:false LVPath: SkipLVRelease:false Location: LVBackedPV:false}]

Do you have any idea where is the issue and how can I solve it ?

Solution

I talked with a Rook Dev about this issue on this post : https://groups.google.com/forum/#!topic/rook-dev/NIO16OZFeGY

He was able to reproduce the problem :

Yohan, I’m also able to reproduce this problem of the labels not being picked up by the OSDs even though the labels are detected in the OSD prepare pod as you see. Could you open a GitHub issue for this? I’m investigating the fix.

But it appears that the issue was only concerning OSDs using directories and the problem does not exist when you use devices (like RAW devices) :

Yohan, I found that this only affects OSDs created on directories. I would recommend you test creating the OSDs on raw devices to get the CRUSH map populated correctly. In the v1.3 release it is also important to note that support for directories on OSDs is being removed. It will be expected that OSDs will be created on raw devices or partitions after that release. See this issue for more details: https://github.com/rook/rook/issues/4724

Since the support for OSDs on directories is being removed in the next release I don’t anticipate fixing this issue.

As you see, the issue will not be fixed because the use of directories will be soon deprecated.

I restarted my tests with the use of RAW devices instead of directories and it worked like a charm.

I want to thanks Travis for the help he provided and his quick answers !