kubernetesansibleubuntu-18.04ansible-inventorykubespray

Kubespray disable 'swapoff' command failed with returning 'non-zero return code'


I ran Kubespray in lxc containers with below configuration:(server_ram:8G | all nodes in ubuntu:18.04)

|  NAME   |  STATE  |         IPV4  
+---------+---------+-------------------         
| ansible | RUNNING | 10.21.185.23 (eth0)  
| node1   | RUNNING | 10.21.185.158 (eth0)  
| node2   | RUNNING | 10.21.185.186 (eth0)   
| node3   | RUNNING | 10.21.185.65 (eth0)  
| node4   | RUNNING | 10.21.185.106 (eth0)  
| node5   | RUNNING | 10.21.185.14 (eth0) 

In root@ansible: when i ran kubespray command to build cluster i encountered with this Error:


TASK [kubernetes/preinstall : Disable swap] ******************
fatal: [node1]: FAILED! => {"changed": true, "cmd": ["/sbin/swapoff", "-a"], "delta": "0:00:00.020302", "end": "2020-05-13 07:21:24.974910", "msg": "non-zero return code", "rc": 255, "start": "2020-05-13 07:21:24.954608", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [node2]: FAILED! => {"changed": true, "cmd": ["/sbin/swapoff", "-a"], "delta": "0:00:00.010084", "end": "2020-05-13 07:21:25.051443", "msg": "non-zero return code", "rc": 255, "start": "2020-05-13 07:21:25.041359", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [node3]: FAILED! => {"changed": true, "cmd": ["/sbin/swapoff", "-a"], "delta": "0:00:00.008382", "end": "2020-05-13 07:21:25.126695", "msg": "non-zero return code", "rc": 255, "start": "2020-05-13 07:21:25.118313", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [node4]: FAILED! => {"changed": true, "cmd": ["/sbin/swapoff", "-a"], "delta": "0:00:00.006829", "end": "2020-05-13 07:21:25.196145", "msg": "non-zero return code", "rc": 255, "start": "2020-05-13 07:21:25.189316", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

lxc containers configuration:(include:node1,node2,node3,node4,node5)

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 18.04 LTS amd64 (release) (20200506)
  image.label: release
  image.os: ubuntu
  image.release: bionic
  image.serial: "20200506"
  image.version: "18.04"
  limits.cpu: "2"
  limits.memory: 2GB
  limits.memory.swap: "false"
  linux.kernel_modules: ip_tables,ip6_tables,netlink_diag,nf_nat,overlay
  raw.lxc: "lxc.apparmor.profile=unconfined\nlxc.cap.drop= \nlxc.cgroup.devices.allow=a\nlxc.mount.auto=proc:rw
    sys:rw"
  security.nesting: "true"
  security.privileged: "true"
  volatile.base_image: 93b9eeb85479af2029203b4a56a2f1fdca6a0e1bf23cdc26b567790bf0f3f3bd
  volatile.eth0.hwaddr: 00:16:3e:5a:91:9a
  volatile.idmap.base: "0"
  volatile.idmap.next: '[]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
devices: {}
ephemeral: false
profiles:
- default
stateful: false
description: ""

When i try to swapoff manually in nodes i receive nothing.

root@node1:~# /sbin/swapoff -a
root@node1:~#

it will be so helpful if anyone has an idea.


Solution

  • I divided this answer on 2 parts:


    TL;DR

    Kubespray fails because he gets non exit zero code (255) when running swapoff -a.

    A non-zero exit status indicates failure. This seemingly counter-intuitive scheme is used so there is one well-defined way to indicate success and a variety of ways to indicate various failure modes.

    Gnu.org: Exit Status

    Even if you set limits.memory.swap: "false" in the profile associated with the containers it will still produce this error.

    There is a workaround for it by disabling swap in your host system. You can do it by:

    After that your container should produce zero exit code when issuing $ swapoff -a


    How to install Kubernetes with Kubespray on LXC containers

    Assuming that you created your lxc containers and have full ssh access to them, there are still things to take into consideration before running kubespray.

    I ran kubespray on lxc containers and stumbled upon issues with:

    Storage space

    Please make sure you have enough storage within your storage pool as lack of it will result in failure to provision the cluster. Default storage pool size could be not big enough to hold 5 nodes.

    Docker packages

    When provisioning the cluster please make sure that you have the newest kubespray version available as the older ones had an issue with docker packages not compatible with each other.

    Kmsg

    The /dev/kmsg character device node provides userspace access to the kernel's printk buffer.

    Kernel.org: Documentation: dev-kmsg

    By default kubespray will fail to provision the cluster when the /dev/kmsg is not available on the node (lxc container).

    /dev/kmsg is not available on lxc container and this will cause a failure of kubespray provisioning.

    There is a workaround for it. In each lxc container run:

    # Hack required to provision K8s v1.15+ in LXC containers
    mknod /dev/kmsg c 1 11
    chmod +x /etc/rc.d/rc.local
    echo 'mknod /dev/kmsg c 1 11' >> /etc/rc.d/rc.local
    

    Github.com: Justmeandopensource: lxd-provisioning: bootstrap-kube.sh

    I tried other workarounds like:

    Kernel modules

    The LXC/LXD system containers do not load kernel modules for their own use. What you do, is get the host it load the kernel module, and this module could be available in the container.

    Linuxcontainers.org: How to add kernel modules to LXC container

    Kubespray will check if certain kernel modules are available within your nodes.

    You will need to add following modules on your host:

    You can add above modules with $ modprobe MODULE_NAME or follow this link: Cyberciti.biz: Linux how to load a kernel module automatically.

    Conntrack

    You will need to install conntrack and load a module named nf_conntrack:

    Without above commands kubespray will fail on step of checking the availability of conntrack.

    With this change in place you should be able to run Kubernetes cluster with kubespray within lxc environment and get output of nodes similar to this:

    root@k8s1:~# kubectl get nodes -o wide
    NAME   STATUS   ROLES    AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
    k8s1   Ready    master   14h   v1.18.2   10.224.47.185   <none>        Ubuntu 18.04.4 LTS   5.4.0-31-generic   docker://18.9.7
    k8s2   Ready    master   14h   v1.18.2   10.224.47.98    <none>        Ubuntu 18.04.4 LTS   5.4.0-31-generic   docker://18.9.7
    k8s3   Ready    <none>   14h   v1.18.2   10.224.47.46    <none>        Ubuntu 18.04.4 LTS   5.4.0-31-generic   docker://18.9.7
    k8s4   Ready    <none>   14h   v1.18.2   10.224.47.246   <none>        Ubuntu 18.04.4 LTS   5.4.0-31-generic   docker://18.9.7