I am a bit desperate and I hope someone can help me. A few months ago I installed the eclipse cloud2edge package on a kubernetes cluster by following the installation instructions, creating a persistentVolume and running the helm install command with these options.
helm install -n $NS --wait --timeout 15m $RELEASE eclipse-iot/cloud2edge --set hono.prometheus.createInstance=false --set hono.grafana.enabled=false --dependency-update --debug
The yaml of the persistentVolume is the following and I create it in the same namespace that I install the package.
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-device-registry
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 1Mi
hostPath:
path: /mnt/
type: Directory
Everything works perfectly, all pods were ready and running, until the other day when the cluster crashed and some pods stopped working.
The kubectl get pods -n $NS output is as follows:
NAME READY STATUS RESTARTS AGE
ditto-mongodb-7b78b468fb-8kshj 1/1 Running 0 50m
dt-adapter-amqp-vertx-6699ccf495-fc8nx 0/1 Running 0 50m
dt-adapter-http-vertx-545564ff9f-gx5fp 0/1 Running 0 50m
dt-adapter-mqtt-vertx-58c8975678-k5n49 0/1 Running 0 50m
dt-artemis-6759fb6cb8-5rq8p 1/1 Running 1 50m
dt-dispatch-router-5bc7586f76-57dwb 1/1 Running 0 50m
dt-ditto-concierge-f6d5f6f9c-pfmcw 1/1 Running 0 50m
dt-ditto-connectivity-f556db698-q89bw 1/1 Running 0 50m
dt-ditto-gateway-589d8f5596-59c5b 1/1 Running 0 50m
dt-ditto-nginx-897b5bc76-cx2dr 1/1 Running 0 50m
dt-ditto-policies-75cb5c6557-j5zdg 1/1 Running 0 50m
dt-ditto-swaggerui-6f6f989ccd-jkhsk 1/1 Running 0 50m
dt-ditto-things-79ff869bc9-l9lct 1/1 Running 0 50m
dt-ditto-thingssearch-58c5578bb9-pwd9k 1/1 Running 0 50m
dt-service-auth-698d4cdfff-ch5wp 1/1 Running 0 50m
dt-service-command-router-59d6556b5f-4nfcj 0/1 Running 0 50m
dt-service-device-registry-7cf75d794f-pk9ct 0/1 Running 0 50m
The pods that fail all have the same error when running kubectl describe pod POD_NAME -n $NS.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 53m default-scheduler Successfully assigned digitaltwins/dt-service-command-router-59d6556b5f-4nfcj to node1
Normal Pulled 53m kubelet Container image "index.docker.io/eclipse/hono-service-command-router:1.8.0" already present on machine
Normal Created 53m kubelet Created container service-command-router
Normal Started 53m kubelet Started container service-command-router
Warning Unhealthy 52m kubelet Readiness probe failed: Get "https://10.244.1.89:8088/readiness": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 2m58s (x295 over 51m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
According to this, the readinessProbe fails. In the yalm definition of the affected deployments, the readinessProbe is defined:
readinessProbe:
failureThreshold: 3
httpGet:
path: /readiness
port: health
scheme: HTTPS
initialDelaySeconds: 45
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
I have tried increasing these values, increasing the delay to 600 and the timeout to 10. Also i have tried uninstalling the package and installing it again, but nothing changes: the installation fails because the pods are never ready and the timeout pops up. I have also exposed port 8088 (health) and called /readiness with wget and the result is still 503. On the other hand, I have tested if livenessProbe works and it works fine. I have also tried resetting the cluster. First I manually deleted everything in it and then used the following commands:
sudo kubeadm reset
sudo iptables -F && sudo iptables -t nat -F && sudo iptables -t mangle -F && sudo iptables -X
sudo systemctl stop kubelet
sudo systemctl stop docker
sudo rm -rf /var/lib/cni/
sudo rm -rf /var/lib/kubelet/*
sudo rm -rf /etc/cni/
sudo ifconfig cni0 down
sudo ifconfig flannel.1 down
sudo ifconfig docker0 down
sudo ip link set cni0 down
sudo brctl delbr cni0
sudo systemctl start docker
sudo kubeadm init --apiserver-advertise-address=192.168.44.11 --pod-network-cidr=10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl --kubeconfig $HOME/.kube/config apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
The cluster seems to work fine because the Eclipse Ditto part has no problem, it's just the Eclipse Hono part. I add a little more information in case it may be useful.
The kubectl logs dt-service-command-router-b654c8dcb-s2g6t -n $NS output:
12:30:06.340 [vert.x-eventloop-thread-1] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin /10.244.1.101:44142 failed to connect over ssl: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
12:30:06.756 [vert.x-eventloop-thread-1] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin /10.244.1.100:46550 failed to connect over ssl: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
12:30:07.876 [vert.x-eventloop-thread-1] ERROR io.vertx.core.net.impl.NetServerImpl - Client from origin /10.244.1.102:40706 failed to connect over ssl: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.client.impl.HonoConnectionImpl - starting attempt [#258] to connect to server [dt-service-device-registry:5671, role: Device Registration]
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - OpenSSL [available: false, supports KeyManagerFactory: false]
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - using JDK's default SSL engine
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.3]
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.2]
12:30:08.315 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - connecting to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Device Registration]
12:30:08.339 [vert.x-eventloop-thread-1] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - can't connect to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Device Registration]: Failed to create SSL connection
12:30:08.339 [vert.x-eventloop-thread-1] WARN o.e.h.client.impl.HonoConnectionImpl - attempt [#258] to connect to server [dt-service-device-registry:5671, role: Device Registration] failed
javax.net.ssl.SSLHandshakeException: Failed to create SSL connection
The kubectl logs dt-adapter-amqp-vertx-74d69cbc44-7kmdq -n $NS output:
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.client.impl.HonoConnectionImpl - starting attempt [#19] to connect to server [dt-service-device-registry:5671, role: Credentials]
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - OpenSSL [available: false, supports KeyManagerFactory: false]
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - using JDK's default SSL engine
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.3]
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - enabling secure protocol [TLSv1.2]
12:19:36.686 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - connecting to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Credentials]
12:19:36.711 [vert.x-eventloop-thread-0] DEBUG o.e.h.c.impl.ConnectionFactoryImpl - can't connect to AMQP 1.0 container [amqps://dt-service-device-registry:5671, role: Credentials]: Failed to create SSL connection
12:19:36.712 [vert.x-eventloop-thread-0] WARN o.e.h.client.impl.HonoConnectionImpl - attempt [#19] to connect to server [dt-service-device-registry:5671, role: Credentials] failed
javax.net.ssl.SSLHandshakeException: Failed to create SSL connection
The kubectl version output is as follows:
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.16", GitCommit:"e37e4ab4cc8dcda84f1344dda47a97bb1927d074", GitTreeState:"clean", BuildDate:"2021-10-27T16:20:18Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
Thanks in advance!
based on the iconic Failed to create SSL Connection output in the logs, I assume that you have run into the dreaded The demo certificates included in the Hono chart have expired problem.
The Cloud2Edge package chart is being updated currently (https://github.com/eclipse/packages/pull/337) with the most recent version of the Ditto and Hono charts (which includes fresh certificates that are valid for two more years to come). As soon as that PR is merged and the Eclipse Packages chart repository has been rebuilt, you should be able to do a helm repo update
and then (hopefully) succesfully install the c2e package.