kuberneteslinux-kernelcephcephfsrook-storage

kubernetes nodes keep rebooting when using rook volumes


Several days ago I faced a problem when my nodes were rebooting constantly

My stack:

I was able to run ceph cluster, but when I tried to deploy my application, that was using my rook-volumes, suddenly my pods were starting to die

I got this message when I used kubectl describe pods/name command:

Pod sandbox changed, it will be killed and re-created

In the k8s events I got:

<Node name> has been rebooted

After some time node comes to life but eventually dies in 2-3 minutes.

I tried to drain my node and connect back to my cluster but after that some another node was getting this error.

I looked into the system error logs of a failed node by command journalctl -p 3.

And found, that logs were flooded with these messages: kernel: cache_from_obj: Wrong slab cache. inode_cache but object is from ceph_inode_info.

After googling this problem, I found this issue: https://github.com/coreos/bugs/issues/2616

It turned out, that cephfs just doesn't work with some versions of Linux kernel!! For me neither of these worked:


Solution

  • Solution

    Cephfs doesn't work with some versions of Linux kernel. Upgrade your kernel. I finally got it working on Ubuntu 18.04 x86_64 5.0.0-38-generic

    Github issue, that helped me: https://github.com/coreos/bugs/issues/2616

    This is indeed a tricky issue, I was struggling to find a solution, and I spent A LOT of time trying to understand what was happening. I hope this information will help some one, cause there is not so much information on google.