I've been trying to get the http-01 challenge method working with traefik v2 and cert-manager, both installed through their current helm charts. The LB endpoint can be requested through the ip and hostname, and I've tested that the http host passes on letsdebug (No issues were found with <domain>
).
Traefik lives in the traefik
namespace, while cert-manager lives in its own cert-manager
namespace. I've created a ClusterIssuer
inside the cert-manager
namespace:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: removed@example.com
privateKeySecretRef:
name: letsencrypt-staging
solvers:
- http01:
ingress:
class: traefik
ingressTemplate:
metadata:
namespace: cert-manager
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
The ingressTemplate
part is my attempt at making the randomly created ingress from cert-manager map to the correct traefik endpoint - this hasn't changed anything, but I leave it in in case I've fubared anything here.
I've then created a Certificate
and applied it - I've tried using both the cert-manager, traefik and default namespace for this, without any differing luck (the actual domain name has been replaced with domain.example.com):
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: domain.example.com
spec:
secretName: domain-example-com-tls
issuerRef:
kind: ClusterIssuer
name: letsencrypt-staging
commonName: domain.example.com
dnsNames:
- domain.example.com
Looking at the logs for the cert-manager pod, I can see both a 404 error and then a "DNS A record error" - the DNS record error seems spurious as it can be resolved with other services and has been present for > 24hrs.
I0413 12:37:51.478359 1 conditions.go:201] Setting lastTransitionTime for Certificate "domain.example.com" condition "Issuing" to 2022-04-13 12:37:51.478353098 +0000 UTC m=+6998.327004050
I0413 12:37:51.760018 1 controller.go:161] cert-manager/certificates-key-manager "msg"="re-queuing item due to optimistic locking on resource" "key"="default/domain.example.com" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"domain.example.com\": the object has been modified; please apply your changes to the latest version and try again"
I0413 12:37:51.769026 1 conditions.go:261] Setting lastTransitionTime for CertificateRequest "domain.example.com-r98k2" condition "Approved" to 2022-04-13 12:37:51.769016958 +0000 UTC m=+6998.617667914
I0413 12:37:51.836517 1 conditions.go:261] Setting lastTransitionTime for CertificateRequest "domain.example.com-r98k2" condition "Ready" to 2022-04-13 12:37:51.836496254 +0000 UTC m=+6998.685147170
I0413 12:37:51.868932 1 conditions.go:261] Setting lastTransitionTime for CertificateRequest "domain.example.com-r98k2" condition "Ready" to 2022-04-13 12:37:51.868921204 +0000 UTC m=+6998.717572135
I0413 12:37:51.888553 1 controller.go:161] cert-manager/certificaterequests-issuer-acme "msg"="re-queuing item due to optimistic locking on resource" "key"="default/domain.example.com-r98k2" "error"="Operation cannot be fulfilled on certificaterequests.cert-manager.io \"domain.example.com-r98k2\": the object has been modified; please apply your changes to the latest version and try again"
E0413 12:37:53.529269 1 controller.go:210] cert-manager/challenges/scheduler "msg"="error scheduling challenge for processing" "error"="Operation cannot be fulfilled on challenges.acme.cert-manager.io \"domain.example.com-r98k2-2809069211-587139531\": the object has been modified; please apply your changes to the latest version and try again" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1"
I0413 12:37:55.028477 1 pod.go:71] cert-manager/challenges/http01/ensurePod "msg"="creating HTTP01 challenge solver pod" "dnsName"="domain.example.com" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:37:55.237109 1 pod.go:59] cert-manager/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="domain.example.com" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-k8wl8" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:37:55.237350 1 service.go:43] cert-manager/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="domain.example.com" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-gvvkt" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:37:55.237539 1 ingress.go:99] cert-manager/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="domain.example.com" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-pbs7c" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0413 12:37:55.260608 1 sync.go:186] cert-manager/challenges "msg"="propagation check failed" "error"="wrong status code '404', expected '200'" "dnsName"="domain.example.com" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:37:55.299879 1 pod.go:59] cert-manager/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="domain.example.com" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-k8wl8" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:37:55.300223 1 service.go:43] cert-manager/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="domain.example.com" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-gvvkt" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:37:55.300570 1 ingress.go:99] cert-manager/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="domain.example.com" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-pbs7c" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0413 12:37:55.316802 1 sync.go:186] cert-manager/challenges "msg"="propagation check failed" "error"="wrong status code '404', expected '200'" "dnsName"="domain.example.com" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:38:05.261345 1 pod.go:59] cert-manager/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="domain.example.com" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-k8wl8" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:38:05.263416 1 service.go:43] cert-manager/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="domain.example.com" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-gvvkt" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:38:05.263822 1 ingress.go:99] cert-manager/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="domain.example.com" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-pbs7c" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0413 12:38:25.541964 1 sync.go:386] cert-manager/challenges/acceptChallenge "msg"="error waiting for authorization" "error"="context deadline exceeded" "dnsName"="domain.example.com" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0413 12:38:25.542087 1 controller.go:166] cert-manager/challenges "msg"="re-queuing item due to error processing" "error"="context deadline exceeded" "key"="default/domain.example.com-r98k2-2809069211-587139531"
I0413 12:38:30.542803 1 pod.go:59] cert-manager/challenges/http01/selfCheck/http01/ensurePod "msg"="found one existing HTTP01 solver pod" "dnsName"="domain.example.com" "related_resource_kind"="Pod" "related_resource_name"="cm-acme-http-solver-k8wl8" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:38:30.543062 1 service.go:43] cert-manager/challenges/http01/selfCheck/http01/ensureService "msg"="found one existing HTTP01 solver Service for challenge resource" "dnsName"="domain.example.com" "related_resource_kind"="Service" "related_resource_name"="cm-acme-http-solver-gvvkt" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
I0413 12:38:30.543218 1 ingress.go:99] cert-manager/challenges/http01/selfCheck/http01/ensureIngress "msg"="found one existing HTTP01 solver ingress" "dnsName"="domain.example.com" "related_resource_kind"="Ingress" "related_resource_name"="cm-acme-http-solver-pbs7c" "related_resource_namespace"="default" "related_resource_version"="v1" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0413 12:38:46.682039 1 sync.go:386] cert-manager/challenges/acceptChallenge "msg"="error waiting for authorization" "error"="acme: authorization error for domain.example.com: 400 urn:ietf:params:acme:error:dns: During secondary validation: DNS problem: query timed out looking up A for domain.example.com; DNS problem: query timed out looking up AAAA for domain.example.com" "dnsName"="domain.example.com" "resource_kind"="Challenge" "resource_name"="domain.example.com-r98k2-2809069211-587139531" "resource_namespace"="default" "resource_version"="v1" "type"="HTTP-01"
E0413 12:38:46.888731 1 controller.go:102] ingress 'default/cm-acme-http-solver-pbs7c' in work queue no longer exists
Looking at Traefik's pod log, I can see that the ingress gets created, but that Traefik is unable to route any requests to it because it can't find the endpoint (this is what I tried to fix with the annotation in the ingressTemplate above):
time="2022-04-13T12:37:57Z" level=error msg="Skipping service: no endpoints found" providerName=kubernetes namespace=default servicePort="&ServiceBackendPort{Name:,Number:8089,}" ingress=cm-acme-http-solver-pbs7c serviceName=cm-acme-http-solver-gvvkt
time="2022-04-13T12:38:46Z" level=error msg="Skipping service: no endpoints found" serviceName=cm-acme-http-solver-gvvkt servicePort="&ServiceBackendPort{Name:,Number:8089,}" providerName=kubernetes ingress=cm-acme-http-solver-pbs7c namespace=default
time="2022-04-13T12:38:46Z" level=error msg="Cannot create service: service not found" servicePort="&ServiceBackendPort{Name:,Number:8089,}" providerName=kubernetes ingress=cm-acme-http-solver-pbs7c namespace=default serviceName=cm-acme-http-solver-gvvkt
time="2022-04-13T12:38:46Z" level=error msg="Cannot create service: service not found" servicePort="&ServiceBackendPort{Name:,Number:8089,}" namespace=default providerName=kubernetes serviceName=cm-acme-http-solver-gvvkt ingress=cm-acme-http-solver-pbs7c
And there's where I'm stuck currently, since the plan is to use Traefik's IngressRoute
CRD for exposing hosts and not use regular ingress entries. Another option would be to test the experimental Gateway support, but as this is the initial setup for a prod cluster I'm not planning to go down that route yet.
Any ideas or further debug information that could be useful?
We have faced the same issue and the problem was related to the fact, that the Ingress generated by the certificate manger
contained the Ingress Controller
reference using the deprecated Annotation kubernetes.io/ingress.class
.
What we wanted:
spec:
ingressClassName: my-traefik-controller
What we got:
annotations:
kubernetes.io/ingress.class: "my-traefik-controller"
This way, the traefik
Ingress Controlelr found the Ingress, but was not able to find the service.
There is a whole discussion on this topic in the cert-manger Github repo.
The solution was to use the cert-manager Annotation acme.cert-manager.io/http01-edit-in-place: "true"
on an existing Ingress.
annotations:
cert-manager.io/cluster-issuer: my-issuer
acme.cert-manager.io/http01-edit-in-place: "true"
spec:
ingressClassName: my-traefik-controller
This way, only the existing Ingress (containing the correct ingressClassName
reference) gets modified and no new solver Ingress gets created.