I've been trying to set up a cluster of 4 Raspberry Pi 4s to run Kubernetes following an old guide. I did this once before, successfully. But after a move and some other changes, I decided to recreate the cluster with fresh installs of Raspberry Pi OS and the latest version of kubeadm (1.19), etc. The one exception is that I'm using Weave 2.6.5 instead of latest per this comment, as it seems the newest version of Weave doesn't work on Pis - something I confirmed myself.
Unfortunately, after brand new, fresh installs of everything, it seems the CoreDNS pods aren't ever coming up. Weave.net came up successfully. But CoreDNS never does. Here's a list of my running pods:
$ k get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-f9fd979d6-6jlq7 0/1 Running 0 6m4s
coredns-f9fd979d6-qqnzw 0/1 Running 0 6m5s
etcd-k8s-master-1 1/1 Running 0 24m
kube-apiserver-k8s-master-1 1/1 Running 0 24m
kube-controller-manager-k8s-master-1 1/1 Running 2 24m
kube-proxy-dq62m 1/1 Running 0 24m
kube-scheduler-k8s-master-1 1/1 Running 2 24m
weave-net-qb7t7 2/2 Running 0 17m
It's also a little weird that the kube-controller-manager and the kube-scheduler are periodically rebooting, but I do wonder if that isn't related to the fact that DNS never comes up? In any case, here are a sample of the pod logs for the DNS containers:
$ k logs -n kube-system pod/coredns-f9fd979d6-6jlq7
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/arm, go1.14.4, f59c03d
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
I1027 17:22:37.977315 1 trace.go:116] Trace[1427131847]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125 (started: 2020-10-27 17:22:07.975379387 +0000 UTC m=+0.092116055) (total time: 30.00156301s):
Trace[1427131847]: [30.00156301s] [30.00156301s] END
I1027 17:22:37.977301 1 trace.go:116] Trace[2019727887]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125 (started: 2020-10-27 17:22:07.976078211 +0000 UTC m=+0.092814546) (total time: 30.000710725s):
Trace[2019727887]: [30.000710725s] [30.000710725s] END
E1027 17:22:37.977433 1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
E1027 17:22:37.977471 1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
I1027 17:22:37.978491 1 trace.go:116] Trace[911902081]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125 (started: 2020-10-27 17:22:07.97561805 +0000 UTC m=+0.092354423) (total time: 30.002742659s):
Trace[911902081]: [30.002742659s] [30.002742659s] END
E1027 17:22:37.978535 1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Endpoints: Get "https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
As an application developer, I've used Kubernetes and love it. But I must confess when I get into the gritty undercarriage of what it's doing (read: when it doesn't work), I find myself getting pretty lost. The local IP of my Pi is 192.168.1.194. In fact all the local IPs are in the 192.168.x.x range. So why does it seem like it's trying to access 10.96.0.1, and then get an I/O timeout? Is that normal? Is that just part of Docker networking internals, like a fake IP mapped to the Docker system or somesuch?
And more importantly, what might I need to do, to get the DNS working? Naturally I can curl things fine from the console, so DNS works on the Pi. Earlier during the setup I also ran the following commands:
sudo iptables -P FORWARD ACCEPT
sudo ufw allow 8080
sudo ufw allow 16443
sudo ufw allow ssh
sudo ufw default allow routed
sudo ufw enable
It had been my experience that these few commands were all that was needed to "un-block" the DNS from working, but unfortunately this time around that doesn't seem sufficient, as the CoreDNS containers are never fully ready.
I'm happy to provide whatever additional log messages might be helpful.
I figured it out... I am a dummy. I had allowed port 16443 through on the firewall (ufw), but I should have allowed 6443. Opening up that port fixed everything.