apache-zookeeperkubernetes-statefulsetlivenessprobe

zookeeper statefulset require all instance to manually restart after upgrade


I am running 3 zookeeper kubernetes statefulset for my kafka clusters. Named zookeeper-0 , zookeeper-1, zookeeper-2. And I have liveness probe enabled with ruok command. If any statefulset pod get restarted due to any failures the quorum will fail and zookeeper stops working even after the failed pod started up and its liveness probe responds ok.

When this happens I have to manually restart all zookeeper instances to get it back to working. This also happens when I do a helm upgrade. Because when I do a helm upgrade the first instance getting restarted is zookeeper-2 , then zookeeper-1 and finally zookeeper-0. But it seems zookeeper will only work if I start all instances together. So after every helm upgrade I have to restart all instances manually.

My question is:

What could be the reason for this behaviour? Also what is the best way to ensure a 100% reliable statefulset zookeeper in kubernetes environment?


Solution

  • This is a bug in zookeeper v 3.5.8 , https://issues.apache.org/jira/browse/ZOOKEEPER-3829 . If anyone comes across this issue please use the latest zookeeper.

    When you do helm upgrade , the instances will be deployed in the reverse order. So if you have zookeeper-0, zookeeper-1 , zookeeper-2 . Then zookeeper-2 will get upgraded first then zookeeper-1 and finally zookeeper-0. This issue is very evident if you have 5 instances.

    So to get around this instability issue I added liveness and readiness probe. So zookeeper will get restarted automatically when this issue happens and self heals.

    For Kubernetes Statefulset::

           livenessProbe:
                  exec:
                    command:
                    - sh
                    - -c
                    - /bin/liveness-check.sh 2181
                  initialDelaySeconds: 120 #Default 0
                  periodSeconds: 60
                  timeoutSeconds: 10
                  failureThreshold: 2
                  successThreshold: 1
            readinessProbe:
                  exec:
                    command:
                    - sh
                    - -c
                    - /bin/readiness-check.sh 2181
                  initialDelaySeconds: 20
                  periodSeconds: 30
                  timeoutSeconds: 5
                  failureThreshold: 2
                  successThreshold: 1
    

    Where scripts are defined in Kubernetes ConfigMap

    data:
        liveness-check.sh: |
          #!/bin/sh
          zkServer.sh status
    
        readiness-check.sh: |
          #!/bin/sh
          echo ruok | nc 127.0.0.1 $1
    

    If you don't have zkServer.sh script in the docker image for some reason you can define the config map as following

      readiness-check.sh: |
        #!/bin/sh
        OK=$(echo ruok | nc 127.0.0.1 $1)
        if [ "$OK" == "imok" ]; then
            exit 0
        else
            echo "Liveness Check Failed for ZooKeeper"
            exit 1
        fi
      liveness-check.sh: |
        #!/bin/sh
        echo stat | nc 127.0.0.1 $1 | grep Mode