kuberneteskubeletcorednsjujuresolv

coredns is running but not ready after conjure-up k8s cdk


I have deployed Kubernetes V1.18.2 (CDK) using conjure-up (which used bionic) Update: Destroyed the above env completely and then deployed it back again manually using CDK bundle here https://jaas.ai/canonical-kubernetes, same K8S version same OS version (Ubuntu 18.04) no difference.

The coredns is resolving via /etc/resolv.conf, see configmap below:

Name:         coredns
Namespace:    kube-system
Labels:       cdk-addons=true
Annotations:  
Data
====
Corefile:
----
.:53 {
    errors
    health {
      lameduck 5s
    }
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
      fallthrough in-addr.arpa ip6.arpa
    }
    prometheus :9153
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}

Events:  <none>

There is a known issue here at https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#known-issues about /etc/resolv.conf instead of /run/systemd/resolve/resolv.conf

I edited coredns config map to point it to /run/systemd/resolve/resolv.conf but the settings gets reverted.

I also tried setting kubelet-extra-config to {resolvConf: /run/systemd/resolve/resolv.conf}, restarted the server, no changes:

kubelet-extra-config:
    default: '{}'
    description: |
      Extra configuration to be passed to kubelet. Any values specified in this
      config will be merged into a KubeletConfiguration file that is passed to
      the kubelet service via the --config flag. This can be used to override
      values provided by the charm.
      Requires Kubernetes 1.10+.
      The value for this config must be a YAML mapping that can be safely
      merged with a KubeletConfiguration file. For example:
        {evictionHard: {memory.available: 200Mi}}
      For more information about KubeletConfiguration, see upstream docs:
      https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/
    source: user
    type: string
    value: '{resolvConf: /run/systemd/resolve/resolv.conf}'

But I can see the changes in kubelet config when inspecting the configuration as per https://kubernetes.io/docs/tasks/administer-cluster/reconfigure-kubelet/

...
"resolvConf": "/run/systemd/resolve/resolv.conf",
...

This is the error I get in coredns pod:

E0429 09:16:42.172959       1 reflector.go:153] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get https://10.152.183.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.152.183.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"

See kubernetes service:

default                           kubernetes                               ClusterIP   10.152.183.1     <none>        443/TCP                  4h42m   <none>

Here is coredns deployment:

Name:                   coredns
Namespace:              kube-system
CreationTimestamp:      Wed, 29 Apr 2020 09:15:07 +0000
Labels:                 cdk-addons=true
                        cdk-restart-on-ca-change=true
                        k8s-app=kube-dns
                        kubernetes.io/name=CoreDNS
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               k8s-app=kube-dns
Replicas:               1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  1 max unavailable, 25% max surge
Pod Template:
  Labels:           k8s-app=kube-dns
  Service Account:  coredns
  Containers:
   coredns:
    Image:       rocks.canonical.com:443/cdk/coredns/coredns-amd64:1.6.7
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
  Volumes:
   config-volume:
    Type:               ConfigMap (a volume populated by a ConfigMap)
    Name:               coredns
    Optional:           false
  Priority Class Name:  system-cluster-critical
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    False   ProgressDeadlineExceeded
OldReplicaSets:  <none>
NewReplicaSet:   coredns-6b59b8bd9f (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  11m   deployment-controller  Scaled up replica set coredns-6b59b8bd9f to 1

Can anyone help, please?

More info: K8S SVC is configured correctly:

Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP:                10.152.183.1
Port:              https  443/TCP
TargetPort:        6443/TCP
Endpoints:         xx.xx.xx.xx:6443,xx.xx.xx.yy:6443
Session Affinity:  None
Events:            <none>

I can curl both IP addresses with --insecure

Describing EP:

kubectl describe ep kubernetes 
Name:         kubernetes
Namespace:    default
Labels:       <none>
Annotations:  <none>
Subsets:
  Addresses:          xx.xx.xx.xx,xx.xx.xx.yy
  NotReadyAddresses:  <none>
  Ports:
    Name   Port  Protocol
    ----   ----  --------
    https  6443  TCP

Events:  <none>

Additional more findings: It looks like most of the vnets created by juju during CDK deployment are not running. I am suspecting this is because of apparmor (based on https://jaas.ai/canonical-kubernetes/bundle/21 )

Note: If you desire to deploy this bundle locally on your laptop, see the segment about Conjure-Up under Alternate Deployment Methods. Default deployment via juju will not properly adjust the apparmor profile to support running kubernetes in LXD. At this time, it is a necessary intermediate deployment mechanism.

7: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br1 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:f0:0c:29 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fef0:c29/64 scope link 
       valid_lft forever preferred_lft forever
70: vnet12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br1 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:00:a3:94 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe00:a394/64 scope link 
       valid_lft forever preferred_lft forever
72: vnet13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br1 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:15:17:f4 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe15:17f4/64 scope link 
       valid_lft forever preferred_lft forever
74: vnet14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br1 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:ec:5c:72 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:feec:5c72/64 scope link 
       valid_lft forever preferred_lft forever
76: vnet15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br1 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:60:79:18 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe60:7918/64 scope link 
       valid_lft forever preferred_lft forever
79: vnet16: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br1 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:67:ff:14 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe67:ff14/64 scope link 
       valid_lft forever preferred_lft forever
81: vnet17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br1 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:96:71:01 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe96:7101/64 scope link 
       valid_lft forever preferred_lft forever
83: vnet18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br1 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:a8:1d:b7 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fea8:1db7/64 scope link 
       valid_lft forever preferred_lft forever
85: vnet19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br1 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:2a:89:c1 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe2a:89c1/64 scope link 
       valid_lft forever preferred_lft forever
87: vnet20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br1 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:4e:ce:fb brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe4e:cefb/64 scope link 
       valid_lft forever preferred_lft forever
89: vnet21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br1 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:93:55:ac brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:fe93:55ac/64 scope link 
       valid_lft forever preferred_lft forever
90: vnet22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br1 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:b7:ae:b2 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:feb7:aeb2/64 scope link 
       valid_lft forever preferred_lft forever

Another new update: I tried xenial deployment and noted that /etc/resolv.conf is correctly configured with no issues, however the issue remained the same


Solution

  • It turned out that flannel was conflicting with my local network, specifying the following in the juju's bundle.yaml before deployment:

    applications:
      flannel:
        options:
          cidr: 10.2.0.0/16
    

    Resolved the issue once and for all! :)