kubernetesflannelcni

Kubernetes - segfault in libnftnl.so.11.3.0 on flannel CNI


I have a self managed Kubernetes cluster consisting of one master node and 3 worker nodes. I use the Cluster Network Interface flannel within the cluster.

On all my machines I can see the following kind of kernel messages:

Apr 12 04:22:24 worker-7 kernel: [278523.379954] iptables[6260]: segfault at 88 ip 00007f9e69fefe47 sp 00007ffee4dff356 error 4 in libnftnl.so.11.3.0[7f9e69feb000+16000]
Apr 12 04:22:24 worker-7 kernel: [278523.380094] Code: bf 88 00 00 00 48 8b 2f 48 39 df 74 13 4c 89 ee 41 ff d4 85 c0 78 0b 48 89 ef 48 8b 6d 00 eb e8 31 c0 5a 5b 5d 41 5c 41 5d c3 <48> 8b 87 88 00 00 00 48 81 c7 78 00 00 00 48 39 f8 74 0b 85 f6 74
Apr 12 05:59:10 worker-7 kernel: [284329.182667] iptables[13978]: segfault at 88 ip 00007fb799fafe47 sp 00007fff22419b36 error 4 in libnftnl.so.11.3.0[7fb799fab000+16000]
Apr 12 05:59:10 worker-7 kernel: [284329.182774] Code: bf 88 00 00 00 48 8b 2f 48 39 df 74 13 4c 89 ee 41 ff d4 85 c0 78 0b 48 89 ef 48 8b 6d 00 eb e8 31 c0 5a 5b 5d 41 5c 41 5d c3 <48> 8b 87 88 00 00 00 48 81 c7 98 00 00 00 48 39 f8 74 0b 85 f6 74
Apr 12 08:29:25 worker-7 kernel: [293343.999073] iptables[16041]: segfault at 88 ip 00007fa40c7f7e47 sp 00007ffe04ba9886 error 4 in libnftnl.so.11.3.0[7fa40c7f3000+16000]
Apr 12 08:29:25 worker-7 kernel: [293343.999165] Code: bf 88 00 00 00 48 8b 2f 48 39 df 74 13 4c 89 ee 41 ff d4 85 c0 78 0b 48 89 ef 48 8b 6d 00 eb e8 31 c0 5a 5b 5d 41 5c 41 5d c3 <48> 8b 87 88 00 00 00 48 81 c7 98 00 00 00 48 39 f8 74 0b 85 f6 74

I narrowed it down that the messages originated in the kube-flannel-ds pods. I have log messages like this one:

Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t filter -C FORWARD -s 10.244.0.0/16 -j ACCEPT --wait]: exit status -1:
Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t nat -C POS TROUTING -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully --wait]: exit status -1: 

Can someone explain what this kind of messages are indicating? Can this be a hardware problem? Does it make sense to switch form flannel to another kuberentes container network interface (CNI) - e.g. Calico?


Solution

  • As already mentioned in the comments debian buster uses nftables backed instead of iptables:

    NOTE: iptables is being replaced by nftables starting with Debian Buster - reference here

    Unfortunately the nftables are not compatible at this moment with kubernetes.

    In Linux, nftables is available as a modern replacement for the kernel’s iptables subsystem. The iptables tooling can act as a compatibility layer, behaving like iptables but actually configuring nftables. This nftables backend is not compatible with the current kubeadm packages: it causes duplicated firewall rules and breaks kube-proxy. You could try to switch to legacy option like described here but I'm not sure about this solution as I don't have a way to test it with your Os. I solved similar case with debian with this here.

    Alternative way would to switch to Calico which actually supports nftbacked with FELIX_IPTABLESBACKEND. This parameter controls which variant of iptables binary Felix uses. Set this to Auto for auto detection of the backend. If a specific backend is needed then use NFT for hosts using a netfilter backend or Legacy for others. [Default: Auto].

    When installing calico with containerd please also have a look a this case.