azure-aksnginx-ingresscert-manager

cert-manager does not issue certificate after upgrading to AKS k8s 1.24.6


I have an automatic setup with scripts and helm to create a Kubernetes Cluster on MS Azure and to deploy my application to the cluster. First of all: everything works fine when I create a cluster with Kubernetes 1.23.12, that means after a few minutes everything is installed and I can access my website and there is a certificate issued by letsencrypt. But when I delete this cluster completely and reinstall it and only change the Kubernetes version from 1.23.12 to 1.24.6. I dont't get a certificate any more.

I see that the acme challenge is not working. I get the following error:

Waiting for HTTP-01 challenge propagation: failed to perform self check GET request 'http://my.hostname.de/.well-known/acme-challenge/2Y25fxsoeQTIqprKNR4iI4X81jPoLknmRNvj9uhcOLk': Get "http://my.hostname.de/.well-known/acme-challenge/2Y25fxsoeQTIqprKNR4iI4X81jPoLknmRNvj9uhcOLk": dial tcp: lookup my.hostname.de on 10.0.0.10:53: no such host

After some time the error message changes to:

'Error accepting authorization: acme: authorization error for my.hostname.de: 400 urn:ietf:params:acme:error:connection: 20.79.77.156: Fetching http://my.hostname.de/.well-known/acme-challenge/2Y25fxsoeQTIqprKNR4iI4X81jPoLknmRNvj9uhcOLk: Timeout during connect (likely firewall problem)'

10.0.0.10 is the cluster IP of kube-dns in my kubernetes cluster. When I look at "Services and Ingresses" in Azure portal I can see the port 53/UDP;53/TCP for the cluster IP 10.0.0.10 And I can see there that 20.79.77.156 is the external IP of the ingres-nginx-controller (Ports: 80:32284/TCP;443:32380/TCP)

So I do not understand why the acme challenge cannot be performed successfully.

Here some information about the version numbers:

Azure Kubernetes 1.24.6 helm 3.11 cert-manager 1.11.0 ingress-nginx helm-chart: 4.4.2 -> controller-v1.5.1

I have tried to find the same error on the internet. But you don't find it often and the solutions do not seem to fit to my problem.

Of course I have read a lot about k8s 1.24.

It is not a dockershim problem, because I have tested the cluster with the Detector for Docker Socket (DDS) tool.

I have updated cert-manager and ingress-nginx to new versions (see above)

I have also tried it with Kubernetes 1.25.4 -> same error

I have found this on the cert-manager Website: "cert-manager expects that ServerSideApply is enabled in the cluster for all versions of Kubernetes from 1.24 and above."

I think I understood the difference between Server Side Apply and Client Side Apply, but I don't know if and how I can enable it in my cluster and if this could be a solution to my problem.

Any help is appreciated. Thanks in advance!


Solution

  • I've solved this myself recently, try this for your ingress controller:

    ingress-nginx:
      rbac:
        create: true
      controller:
        service:
          annotations:
            service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: /healthz
    

    k8s 1.24+ is using a different endpoint for health probes.