kubernetesnetwork-programming

k8s 1.18.1: api not reachable since update to 1.18.1


I've updated my cluster to v1.18.1. Some applications have API access but they returned an error that the API is not reachable. A similar error, is returned by the ping command. Here are two outputs, the first from a go application and the second from a ping command against the API.

I0623 15:58:57.317985      23 trace.go:201] Trace[163617342]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.18.1/tools/cache/reflector.go:125 (23-Jun-2020 15:58:00.317) (total time: 30000ms):
Trace[163617342]: [30.000517214s] [30.000517214s] END
E0623 15:58:57.318003      23 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.1/tools/cache/reflector.go:125: Failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/namespaces/default/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
$ kubectl exec -it network-tools -- ping 10.96.0.1
ping: socket: Operation not permitted
command terminated with exit code 2

I can rule out that the API is basically not accessible. I can access it via kubectl.

As network plugin I use flannel. To be on the safe side, I have re-played the official flannel YAML to make sure that there is no possible update. But this did not help.

Now I just don't know where the mistake comes from. To give supporters some more info about the cluster here are some details.

Volker

Nodes

$ kgno -o wide
NAME         STATUS   ROLES    AGE    VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
orbisos001   Ready    master   4h5m   v1.18.1   192.168.179.100   <none>        CentOS Linux 7 (Core)   3.10.0-1127.el7.x86_64        cri-o://1.18.1
orbisos002   Ready    <none>   4h3m   v1.18.1   192.168.179.111   <none>        CentOS Linux 7 (Core)   3.10.0-1127.10.1.el7.x86_64   cri-o://1.18.1

Services

$ kgsv --all-namespaces
NAMESPACE     NAME         TYPE           CLUSTER-IP     EXTERNAL-IP       PORT(S)                      AGE
default       kubernetes   ClusterIP      10.96.0.1      <none>            443/TCP                      4h10m
default       proxy        LoadBalancer   10.107.54.36   192.168.179.101   80:31414/TCP,443:32154/TCP   3h15m
kube-system   kube-dns     ClusterIP      10.96.0.10     <none>            53/UDP,53/TCP,9153/TCP       4h10m

/etc/cni/net.d/100-crio-bridge.conf

{
    "cniVersion": "0.3.1",
    "name": "crio",
    "type": "bridge",
    "bridge": "cni0",
    "isGateway": true,
    "ipMasq": true,
    "hairpinMode": true,
    "ipam": {
        "type": "host-local",
        "routes": [
            { "dst": "0.0.0.0/0" },
            { "dst": "1100:200::1/24" }
        ],
        "ranges": [
            [{ "subnet": "10.85.0.0/16" }],
            [{ "subnet": "1100:200::/24" }]
        ]
    }
}

RPM-Versions

$ yum list installed | grep -e kube -e cri-
cri-o.x86_64                       2:1.18.1-1.1.el7                 @crio
cri-tools.x86_64                   1.13.0-1.rhaos4.1.gitc06001f.el7 @tools
kubeadm.x86_64                     1.18.1-0                         @kubernetes
kubectl.x86_64                     1.18.1-0                         @kubernetes
kubelet.x86_64                     1.18.1-0                         @kubernetes

KUBELET_EXTRA_ARGS

$ cat /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS=--cgroup-driver=systemd --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice

Solution

  • After a long search I have found a solution. The container kube-flannel-amd64 in the kube-system namespace initially throws the error that it has no permission to access iptables. So the IP packets are not routed over the VXLAN. This caused into the timeout error.

    To give the container access to iptables of the hostsystem I changed the official kube-flannel.yml from privileged: false to true.

    securityContext:
      privileged: true
      capabilities:
        add: ["NET_ADMIN"]
    

    After a new deploy of the YAML file the rules are successfully created:

    I0625 08:53:13.166567       1 vxlan_network.go:60] watching for new subnet leases
    I0625 08:53:13.168489       1 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
    I0625 08:53:13.168501       1 iptables.go:167] Deleting iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
    I0625 08:53:13.169149       1 iptables.go:167] Deleting iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
    I0625 08:53:13.170435       1 iptables.go:167] Deleting iptables rule: ! -s 10.244.0.0/16 -d 10.244.1.0/24 -j RETURN
    I0625 08:53:13.171086       1 iptables.go:167] Deleting iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE --random-fully
    I0625 08:53:13.262424       1 iptables.go:155] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
    I0625 08:53:13.263101       1 iptables.go:145] Some iptables rules are missing; deleting and recreating rules
    I0625 08:53:13.263109       1 iptables.go:167] Deleting iptables rule: -s 10.244.0.0/16 -j ACCEPT
    I0625 08:53:13.264018       1 iptables.go:167] Deleting iptables rule: -d 10.244.0.0/16 -j ACCEPT
    I0625 08:53:13.264195       1 iptables.go:155] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE --random-fully
    I0625 08:53:13.264883       1 iptables.go:155] Adding iptables rule: -s 10.244.0.0/16 -j ACCEPT
    I0625 08:53:13.267062       1 iptables.go:155] Adding iptables rule: -d 10.244.0.0/16 -j ACCEPT
    I0625 08:53:13.267781       1 iptables.go:155] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.1.0/24 -j RETURN
    I0625 08:53:13.363094       1 iptables.go:155] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE --random-fully
    

    My application can now connect the kubernetes API.