I've accidentally drained/uncordoned all nodes in Kubernetes (even master) and now I'm trying to bring it back by connecting to the ETCD and manually change some keys in there. I successfuly bashed into etcd container:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8fbcb67da963 quay.io/coreos/etcd:v3.3.10 "/usr/local/bin/etcd" 17 hours ago Up 17 hours etcd1
a0d6426df02a cd48205a40f0 "kube-controller-man…" 17 hours ago Up 17 hours k8s_kube-controller-manager_kube-controller-manager-node1_kube-system_0441d7804a7366fd957f8b402008efe5_16
5fa8e47441a0 6bed756ced73 "kube-scheduler --au…" 17 hours ago Up 17 hours k8s_kube-scheduler_kube-scheduler-node1_kube-system_6f33d7866b72ca1b13c79edd42fa8dc6_14
2c8e07cf499f gcr.io/google_containers/pause-amd64:3.1 "/pause" 17 hours ago Up 17 hours k8s_POD_kube-scheduler-node1_kube-system_6f33d7866b72ca1b13c79edd42fa8dc6_3
2ca43282ea1c gcr.io/google_containers/pause-amd64:3.1 "/pause" 17 hours ago Up 17 hours k8s_POD_kube-controller-manager-node1_kube-system_0441d7804a7366fd957f8b402008efe5_3
9473644a3333 gcr.io/google_containers/pause-amd64:3.1 "/pause" 17 hours ago Up 17 hours k8s_POD_kube-apiserver-node1_kube-system_93ff1a9840f77f8b2b924a85815e17fe_3
and then I run:
docker exec -it 8fbcb67da963 /bin/sh
and then I try to run the following:
ETCDCTL_API=3 etcdctl --endpoints https://172.16.16.111:2379 --cacert /etc/ssl/etcd/ssl/ca.pem --key /etc/ssl/etcd/ssl/member-node1-key.pem --cert /etc/ssl/etcd/ssl/member-node1.pem get / --prefix=true -w json --debug
and here is the result I get:
ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
ETCDCTL_CERT=/etc/ssl/etcd/ssl/member-node1.pem
ETCDCTL_COMMAND_TIMEOUT=5s
ETCDCTL_DEBUG=true
ETCDCTL_DIAL_TIMEOUT=2s
ETCDCTL_DISCOVERY_SRV=
ETCDCTL_ENDPOINTS=[https://172.16.16.111:2379]
ETCDCTL_HEX=false
ETCDCTL_INSECURE_DISCOVERY=true
ETCDCTL_INSECURE_SKIP_TLS_VERIFY=false
ETCDCTL_INSECURE_TRANSPORT=true
ETCDCTL_KEEPALIVE_TIME=2s
ETCDCTL_KEEPALIVE_TIMEOUT=6s
ETCDCTL_KEY=/etc/ssl/etcd/ssl/member-node1-key.pem
ETCDCTL_USER=
ETCDCTL_WRITE_OUT=json
INFO: 2020/06/24 15:44:07 ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc420246c00
INFO: 2020/06/24 15:44:07 dialing to target with scheme: ""
INFO: 2020/06/24 15:44:07 could not get resolver for scheme: ""
INFO: 2020/06/24 15:44:07 balancerWrapper: is pickfirst: false
INFO: 2020/06/24 15:44:07 balancerWrapper: got update addr from Notify: [{172.16.16.111:2379 <nil>}]
INFO: 2020/06/24 15:44:07 ccBalancerWrapper: new subconn: [{172.16.16.111:2379 0 <nil>}]
INFO: 2020/06/24 15:44:07 balancerWrapper: handle subconn state change: 0xc4201708d0, CONNECTING
INFO: 2020/06/24 15:44:07 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc420246c00
Error: context deadline exceeded
Here is my etcd.env:
# Environment file for etcd v3.3.10
ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://172.16.16.111:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://172.16.16.111:2380
ETCD_INITIAL_CLUSTER_STATE=existing
ETCD_METRICS=basic
ETCD_LISTEN_CLIENT_URLS=https://172.16.16.111:2379,https://127.0.0.1:2379
ETCD_ELECTION_TIMEOUT=5000
ETCD_HEARTBEAT_INTERVAL=250
ETCD_INITIAL_CLUSTER_TOKEN=k8s_etcd
ETCD_LISTEN_PEER_URLS=https://172.16.16.111:2380
ETCD_NAME=etcd1
ETCD_PROXY=off
ETCD_INITIAL_CLUSTER=etcd1=https://172.16.16.111:2380,etcd2=https://172.16.16.112:2380,etcd3=https://172.16.16.113:2380
ETCD_AUTO_COMPACTION_RETENTION=8
ETCD_SNAPSHOT_COUNT=10000
# TLS settings
ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_CERT_FILE=/etc/ssl/etcd/ssl/member-node1.pem
ETCD_KEY_FILE=/etc/ssl/etcd/ssl/member-node1-key.pem
ETCD_CLIENT_CERT_AUTH=true
ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_PEER_CERT_FILE=/etc/ssl/etcd/ssl/member-node1.pem
ETCD_PEER_KEY_FILE=/etc/ssl/etcd/ssl/member-node1-key.pem
ETCD_PEER_CLIENT_CERT_AUTH=True
Update 1:
Here is my kubeadm-config.yaml:
apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 172.16.16.111
bindPort: 6443
certificateKey: d73faece88f86e447eea3ca38f7b07e0a1f0bbb886567fee3b8cf8848b1bf8dd
nodeRegistration:
name: node1
taints: []
criSocket: /var/run/dockershim.sock
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
clusterName: cluster.local
etcd:
external:
endpoints:
- https://172.16.16.111:2379
- https://172.16.16.112:2379
- https://172.16.16.113:2379
caFile: /etc/ssl/etcd/ssl/ca.pem
certFile: /etc/ssl/etcd/ssl/node-node1.pem
keyFile: /etc/ssl/etcd/ssl/node-node1-key.pem
dns:
type: CoreDNS
imageRepository: docker.io/coredns
imageTag: 1.6.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.233.0.0/18
podSubnet: 10.233.64.0/18
kubernetesVersion: v1.16.6
controlPlaneEndpoint: 172.16.16.111:6443
certificatesDir: /etc/kubernetes/ssl
imageRepository: gcr.io/google-containers
apiServer:
extraArgs:
anonymous-auth: "True"
authorization-mode: Node,RBAC
bind-address: 0.0.0.0
insecure-port: "0"
apiserver-count: "1"
endpoint-reconciler-type: lease
service-node-port-range: 30000-32767
kubelet-preferred-address-types: "InternalDNS,InternalIP,Hostname,ExternalDNS,ExternalIP"
profiling: "False"
request-timeout: "1m0s"
enable-aggregator-routing: "False"
storage-backend: etcd3
runtime-config:
allow-privileged: "true"
extraVolumes:
- name: usr-share-ca-certificates
hostPath: /usr/share/ca-certificates
mountPath: /usr/share/ca-certificates
readOnly: true
certSANs:
- kubernetes
- kubernetes.default
- kubernetes.default.svc
- kubernetes.default.svc.cluster.local
- 10.233.0.1
- localhost
- 127.0.0.1
- node1
- lb-apiserver.kubernetes.local
- 172.16.16.111
- node1.cluster.local
timeoutForControlPlane: 5m0s
controllerManager:
extraArgs:
node-monitor-grace-period: 40s
node-monitor-period: 5s
pod-eviction-timeout: 5m0s
node-cidr-mask-size: "24"
profiling: "False"
terminated-pod-gc-threshold: "12500"
bind-address: 0.0.0.0
configure-cloud-routes: "false"
scheduler:
extraArgs:
bind-address: 0.0.0.0
extraVolumes:
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes:
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig:
qps: 5
clusterCIDR: 10.233.64.0/18
configSyncPeriod: 15m0s
conntrack:
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: False
healthzBindAddress: 0.0.0.0:10256
hostnameOverride: node1
iptables:
masqueradeAll: False
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
excludeCIDRs: []
minSyncPeriod: 0s
scheduler: rr
syncPeriod: 30s
strictARP: False
metricsBindAddress: 127.0.0.1:10249
mode: ipvs
nodePortAddresses: []
oomScoreAdj: -999
portRange:
udpIdleTimeout: 250ms
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
clusterDNS:
- 169.254.25.10
Update 2:
Contents of /etc/kubernetes/manigests/kube-apiserver.yaml:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=172.16.16.111
- --allow-privileged=true
- --anonymous-auth=True
- --apiserver-count=1
- --authorization-mode=Node,RBAC
- --bind-address=0.0.0.0
- --client-ca-file=/etc/kubernetes/ssl/ca.crt
- --enable-admission-plugins=NodeRestriction
- --enable-aggregator-routing=False
- --enable-bootstrap-token-auth=true
- --endpoint-reconciler-type=lease
- --etcd-cafile=/etc/ssl/etcd/ssl/ca.pem
- --etcd-certfile=/etc/ssl/etcd/ssl/node-node1.pem
- --etcd-keyfile=/etc/ssl/etcd/ssl/node-node1-key.pem
- --etcd-servers=https://172.16.16.111:2379,https://172.16.16.112:2379,https://172.16.16.113:2379
- --insecure-port=0
- --kubelet-client-certificate=/etc/kubernetes/ssl/apiserver-kubelet-client.crt
- --kubelet-client-key=/etc/kubernetes/ssl/apiserver-kubelet-client.key
- --kubelet-preferred-address-types=InternalDNS,InternalIP,Hostname,ExternalDNS,ExternalIP
- --profiling=False
- --proxy-client-cert-file=/etc/kubernetes/ssl/front-proxy-client.crt
- --proxy-client-key-file=/etc/kubernetes/ssl/front-proxy-client.key
- --request-timeout=1m0s
- --requestheader-allowed-names=front-proxy-client
- --requestheader-client-ca-file=/etc/kubernetes/ssl/front-proxy-ca.crt
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --requestheader-group-headers=X-Remote-Group
- --requestheader-username-headers=X-Remote-User
- --runtime-config=
- --secure-port=6443
- --service-account-key-file=/etc/kubernetes/ssl/sa.pub
- --service-cluster-ip-range=10.233.0.0/18
- --service-node-port-range=30000-32767
- --storage-backend=etcd3
- --tls-cert-file=/etc/kubernetes/ssl/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/ssl/apiserver.key
image: gcr.io/google-containers/kube-apiserver:v1.16.6
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 172.16.16.111
path: /healthz
port: 6443
scheme: HTTPS
initialDelaySeconds: 15
timeoutSeconds: 15
name: kube-apiserver
resources:
requests:
cpu: 250m
volumeMounts:
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
- mountPath: /etc/ca-certificates
name: etc-ca-certificates
readOnly: true
- mountPath: /etc/ssl/etcd/ssl
name: etcd-certs-0
readOnly: true
- mountPath: /etc/kubernetes/ssl
name: k8s-certs
readOnly: true
- mountPath: /usr/local/share/ca-certificates
name: usr-local-share-ca-certificates
readOnly: true
- mountPath: /usr/share/ca-certificates
name: usr-share-ca-certificates
readOnly: true
hostNetwork: true
priorityClassName: system-cluster-critical
volumes:
- hostPath:
path: /etc/ssl/certs
type: DirectoryOrCreate
name: ca-certs
- hostPath:
path: /etc/ca-certificates
type: DirectoryOrCreate
name: etc-ca-certificates
- hostPath:
path: /etc/ssl/etcd/ssl
type: DirectoryOrCreate
name: etcd-certs-0
- hostPath:
path: /etc/kubernetes/ssl
type: DirectoryOrCreate
name: k8s-certs
- hostPath:
path: /usr/local/share/ca-certificates
type: DirectoryOrCreate
name: usr-local-share-ca-certificates
- hostPath:
path: /usr/share/ca-certificates
type: ""
name: usr-share-ca-certificates
status: {}
I used kubespray to install the cluster.
How can I connect to the etcd? Any help would be appreciated.
This context deadline exceeded
generally happens because of
Using wrong certificates. You could be using peer certificates instead of client certificates. You need to check the Kubernetes API Server parameters which will tell you where are the client certificates located because Kubernetes API Server is a client to ETCD. Then you can use those same certificates in the etcdctl
command from the node.
The etcd cluster is not operational anymore because peer members are down.