dockeristiorhel7k3scoredns

Kubernetes Service Requests are sending back responses from Pod IP rather than Service IP, CoreDNS not working for istioctl install


Overview of my setup/problem I'm running a K3s cluster inside a Docker container on a Rhel 7.9 box. This is all on an air gapped network so bare with me if you don't see copy and pasted examples below.

I'm trying to install Istio on the cluster but the install hangs on setting up the ingress gateway deployment. The Istio install hangs on gateway deployment because its unable to resolve the Istiod Kubernetes service from inside the Ingress Gateway pod.

What I've tried I tested the image on a Ubuntu vagrant box and the Istio install works fine there. I've also tested the install on a Windows 10 machine using Rancher Desktop and it works fine there as well. At one point it worked on the Rhel box but my team did some security hardening over a two period but naturally they have no idea what change broke my cluster. So I'm trying to narrow down the search.

I've determined that the issue is with CoreDNS in my K3s cluster. I used the dnsutils docker image and ran a nslookup kubernetes.default. I checked the logs of the CoreDNS pod and it shows the lookup but the response it sends back to nslookup has the ip of the CoreDNS pod rather than the kube-dns Kubernetes service. nslookup correctly sees that and says

nslookup kubernetes.default
;; reply from unexpected source: 10.42.0.4#53, expected 10.43.0.2#53
;; reply from unexpected source: 10.42.0.4#53, expected 10.43.0.2#53
;; reply from unexpected source: 10.42.0.4#53, expected 10.43.0.2#53
;; connection timed out; no servers could be reached

10.42.0.4 being the CoreDNS pod and the 10.43.0.2 being the kube-dns Kubernetes Service for that pod.

The logs from the failing Istio Ingress Gateway pod are saying that its failing to retrieve a certificate from the Istiod pod because the Istiod kubernetes service connection is timing out. Which makes sense considering I can't resolve kubernetes.default correctly either.

2021-05-27T10:28:07.342344Z     warn    ca      ca request failed, starting attempt 1 in 91.589072ms
2021-05-27T10:28:07.434806Z     warn    ca      ca request failed, starting attempt 2 in 203.792343ms
2021-05-27T10:28:07.639557Z     warn    ca      ca request failed, starting attempt 3 in 364.729652ms
2021-05-27T10:28:08.005300Z     warn    ca      ca request failed, starting attempt 4 in 830.723933ms

And then states that the request to the Istiod service timed out

transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 10.96.0.10:53: read udp 10.244.153.113:41187->10.96.0.10:53: i/o timeout

Again my setup is on an air gapped network so ignore the IP addresses in the example above. These were copied from other posts that are related to my issue.

Where to go from here? I'm trying to figure out what could be causing this problem. DNS resolution should be out of the box functionality for K3s and its not working correctly. As I stated before its not the Docker image I'm running k3s out of since I've gotten k3s and Istio to work on other machines.

Any suggestions on what to do next or advice on how to troubleshoot this would be greatly appreciated. Let me know if there is any other info I can provide to help. Thanks!


Solution

  • TLDR - bridge-nf-call-iptables and bridge-nf-callip6tables were disabled. They need to be enabled.

    I found this using docker info. This listed a warning about bridge-nf-call-iptables and bridge-nf-callip6tables being disabled. I found lots of talk on the CoreDNS and k3s github about issues caused by iptables and our suspicions were correct.

    This link was the solution for us.