[SOLVED] Failed to open topo server on vitess with etcd

Failed to open topo server on vitess with etcd

I'm running a simple example with Helm. Take a look below at values.yaml file:

cat << EOF | helm install helm/vitess -n vitess -f -
topology:
  cells:
    - name: 'zone1'
      keyspaces:
        - name: 'vitess'
          shards:
            - name: '0'
              tablets:
                - type: 'replica'
                  vttablet:
                    replicas: 1
      mysqlProtocol:
        enabled: true
        authType: secret
        username: vitess
        passwordSecret: vitess-db-password
      etcd:
        replicas: 3
      vtctld:
        replicas: 1
      vtgate:
        replicas: 3

vttablet:
  dataVolumeClaimSpec:
    storageClassName: nfs-slow
EOF

Take a look at the output of current pods running below:

$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                     READY   STATUS                  RESTARTS   AGE
kube-system   coredns-fb8b8dccf-8f5kt                  1/1     Running                 0          32m
kube-system   coredns-fb8b8dccf-qbd6c                  1/1     Running                 0          32m
kube-system   etcd-master1                             1/1     Running                 0          32m
kube-system   kube-apiserver-master1                   1/1     Running                 0          31m
kube-system   kube-controller-manager-master1          1/1     Running                 0          32m
kube-system   kube-flannel-ds-amd64-bkg9z              1/1     Running                 0          32m
kube-system   kube-flannel-ds-amd64-q8vh4              1/1     Running                 0          32m
kube-system   kube-flannel-ds-amd64-vqmnz              1/1     Running                 0          32m
kube-system   kube-proxy-bd8mf                         1/1     Running                 0          32m
kube-system   kube-proxy-nlc2b                         1/1     Running                 0          32m
kube-system   kube-proxy-x7cd5                         1/1     Running                 0          32m
kube-system   kube-scheduler-master1                   1/1     Running                 0          32m
kube-system   tiller-deploy-8458f6c667-cx2mv           1/1     Running                 0          27m
vitess        etcd-global-6pwvnv29th                   0/1     Init:0/1                0          16m
vitess        etcd-operator-84db9bc774-j4wml           1/1     Running                 0          30m
vitess        etcd-zone1-zwgvd7spzc                    0/1     Init:0/1                0          16m
vitess        vtctld-86cd78b6f5-zgfqg                  0/1     CrashLoopBackOff        7          16m
vitess        vtgate-zone1-58744956c4-x8ms2            0/1     CrashLoopBackOff        7          16m
vitess        zone1-vitess-0-init-shard-master-mbbph   1/1     Running                 0          16m
vitess        zone1-vitess-0-replica-0                 0/6     Init:CrashLoopBackOff   7          16m

Running logs I see this error:

$ kubectl logs -n vitess vtctld-86cd78b6f5-zgfqg
++ cat
+ eval exec /vt/bin/vtctld '-cell="zone1"' '-web_dir="/vt/web/vtctld"' '-web_dir2="/vt/web/vtctld2/app"' -workflow_manager_init -workflow_manager_use_election -logtostderr=true -stderrthreshold=0 -port=15000 -grpc_port=15999 '-service_map="grpc-vtctl"' '-topo_implementation="etcd2"' '-topo_global_server_address="etcd-global-client.vitess:2379"' -topo_global_root=/vitess/global
++ exec /vt/bin/vtctld -cell=zone1 -web_dir=/vt/web/vtctld -web_dir2=/vt/web/vtctld2/app -workflow_manager_init -workflow_manager_use_election -logtostderr=true -stderrthreshold=0 -port=15000 -grpc_port=15999 -service_map=grpc-vtctl -topo_implementation=etcd2 -topo_global_server_address=etcd-global-client.vitess:2379 -topo_global_root=/vitess/global
ERROR: logging before flag.Parse: E0422 02:35:34.020928       1 syslogger.go:122] can't connect to syslog
F0422 02:35:39.025400       1 server.go:221] Failed to open topo server (etcd2,etcd-global-client.vitess:2379,/vitess/global): grpc: timed out when dialing

I'm running behind vagrant with 1 master and 2 nodes. I suspect that is a issue with eth1.

The storage are configured to use NFS.

$ kubectl logs etcd-operator-84db9bc774-j4wml
time="2019-04-22T17:26:51Z" level=info msg="skip reconciliation: running ([]), pending ([etcd-zone1-zwgvd7spzc])" cluster-name=etcd-zone1 cluster-namespace=vitess pkg=cluster
time="2019-04-22T17:26:51Z" level=info msg="skip reconciliation: running ([]), pending ([etcd-zone1-zwgvd7spzc])" cluster-name=etcd-global cluster-namespace=vitess pkg=cluster

Solution

It appears that etcd is not fully initializing. Note that neither the pod for the global lockserver (etcd-global-6pwvnv29th) nor the local one for cell zone1 (pod etcd-zone1-zwgvd7spzc) are ready.