dockerkubernetesapache-kafkaapache-zookeepergoogle-kubernetes-engine

Kafka not able to connect with zookeeper with error "Timed out waiting for connection while in state: CONNECTING"


I am trying to run my kafka and zookeeper in kubernetes pods.

Here is my zookeeper-service.yaml:

apiVersion: v1
kind: Service
metadata:
  annotations:
    kompose.cmd: kompose convert
    kompose.version: 1.1.0 (36652f6)
  creationTimestamp: null
  labels:
    io.kompose.service: zookeeper-svc
  name: zookeeper-svc
spec:
  ports:
  - name: "2181"
    port: 2181
    targetPort: 2181
  selector:
    io.kompose.service: zookeeper
status:
  loadBalancer: {}

Below is zookeeper-deployment.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    kompose.cmd: kompose convert
    kompose.version: 1.1.0 (36652f6)
  creationTimestamp: null
  labels:
    io.kompose.service: zookeeper
  name: zookeeper
spec:
  replicas: 1
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        io.kompose.service: zookeeper
    spec:
      containers:
      - image: wurstmeister/zookeeper
        name: zookeeper
        ports:
        - containerPort: 2181
        resources: {}
      restartPolicy: Always
status: {}

kafka-deployment.yaml is as below:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    kompose.cmd: kompose convert -f docker-compose.yml
    kompose.version: 1.1.0 (36652f6)
  creationTimestamp: null
  labels:
    io.kompose.service: kafka
  name: kafka
spec:
  replicas: 1
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        io.kompose.service: kafka
    spec:
      containers:
      - env:
        - name: KAFKA_ADVERTISED_HOST_NAME
          value: kafka
        - name: KAFKA_ZOOKEEPER_CONNECT
          value: zookeeper:2181
        - name: KAFKA_PORT
          value: "9092"
        - name: KAFKA_ZOOKEEPER_CONNECT_TIMEOUT_MS
          value: "60000"
        image: wurstmeister/kafka
        name: kafka
        ports:
        - containerPort: 9092
        resources: {}
      restartPolicy: Always
status: {}

I first start the zookeeper service and deployment. Once the zookeeper is started and kubectl get pods shows it in running state, I start kafka deployment. Kafka deployment starts failing and restarting again and again, due to restartPolicy as always. When I checked the logs from kafka docker, I found that it is not able to connect to zookeeper service and the connection timesout. Here are the logs from kafka container.

[2018-09-03 07:06:06,670] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
atkafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$ waitUntilConnected$1.apply$mcV$sp(ZooKeeperClient.scala:230)
at kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:226)
at kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:226)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at kafka.zookeeper.ZooKeeperClient.kafka$zookeeper$ZooKeeperClient$$waitUntilConnected(ZooKeeperClient.scala:226)
at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:95)
at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1580)
at kafka.server.KafkaServer.kafka$server$KafkaServer$$createZkClient$1(KafkaServer.scala:348)
at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:372)
at kafka.server.KafkaServer.startup(KafkaServer.scala:202)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:75)
at kafka.Kafka.main(Kafka.scala)
[2018-09-03 07:06:06,671] INFO shutting down (kafka.server.KafkaServer)
[2018-09-03 07:06:06,673] WARN  (kafka.utils.CoreUtils$)
java.lang.NullPointerException
atkafka.server.KafkaServer$$anonfun$shutdown$5.apply$mcV$sp(KafkaServer.scala:579)
at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:86)
at kafka.server.KafkaServer.shutdown(KafkaServer.scala:579)
at kafka.server.KafkaServer.startup(KafkaServer.scala:329)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:75)
at kafka.Kafka.main(Kafka.scala)
[2018-09-03 07:06:06,676] INFO shut down completed 
(kafka.server.KafkaServer)
[2018-09-03 07:06:06,677] ERROR Exiting Kafka. 
(kafka.server.KafkaServerStartable)
[2018-09-03 07:06:06,678] INFO shutting down 
(kafka.server.KafkaServer)

What could be the reason for this ? and solutions ?

Edit: logs from zookeeper pod:

2018-09-03 10:32:39,562 [myid:] - INFO  
[main:ZooKeeperServerMain@96] - Starting server
2018-09-03 10:32:39,567 [myid:] - INFO  [main:Environment@100] - 
Server environment:zookeeper.version=3.4.9-1757313, built on 
08/23/2016 06:50 GMT
2018-09-03 10:32:39,567 [myid:] - INFO  [main:Environment@100] - 
Server environment:host.name=zookeeper-7594d99b-sgm6p
2018-09-03 10:32:39,567 [myid:] - INFO  [main:Environment@100] - 
Server environment:java.version=1.7.0_65
2018-09-03 10:32:39,567 [myid:] - INFO  [main:Environment@100] - 
Server environment:java.vendor=Oracle Corporation
2018-09-03 10:32:39,567 [myid:] - INFO  [main:Environment@100] - 
Server environment:java.home=/usr/lib/jvm/java-7-openjdk-amd64/jre
2018-09-03 10:32:39,567 [myid:] - INFO  [main:Environment@100] - 
Server environment:java.class.path=/opt/zookeeper- 
3.4.9/bin/../build/classes:/opt/zookeeper- 
3.4.9/bin/../build/lib/*.jar:/opt/zookeeper-3.4.9/bin/../lib/slf4j- 
log4j12-1.6.1.jar:/opt/zookeeper-3.4.9/bin/../lib/slf4j-api-1.6. 
1.ja r:/opt/zookeeper-3.4.9/bin/../lib/netty- 
3.10.5.Final.jar:/opt/zookeeper-3.4.9/bin/../lib/log4j- 
1.2.16.jar:/opt/zookeeper-3.4.9/bin/../lib/jline- 
0.9.94.jar:/opt/zookeeper-3.4.9/bin/../zookeeper- 
3.4.9.jar:/opt/zookeeper- 
3.4.9/bin/../src/java/lib/*.jar:/opt/zookeeper-3.4.9/bin/../conf:

2018-09-03 10:32:39,567 [myid:] - INFO  [main:Environment@100] - 
Server environment:java.io.tmpdir=/tmp
2018-09-03 10:32:39,569 [myid:] - INFO  [main:Environment@100] - 
Server environment:java.compiler=<NA>
2018-09-03 10:32:39,569 [myid:] - INFO  [main:Environment@100] - 
Server environment:os.name=Linux
2018-09-03 10:32:39,569 [myid:] - INFO  [main:Environment@100] - 
Server environment:os.arch=amd64 
2018-09-03 10:32:39,569 [myid:] - INFO  [main:Environment@100] - 
Server environment:os.version=4.15.0-20-generic
2018-09-03 10:32:39,569 [myid:] - INFO  [main:Environment@100] -     
Server environment:user.name=root
2018-09-03 10:32:39,569 [myid:] - INFO  [main:Environment@100] - 
Server environment:user.home=/root
2018-09-03 10:32:39,569 [myid:] - INFO  [main:Environment@100] - 
Server environment:user.dir=/opt/zookeeper-3.4.9
2018-09-03 10:32:39,570 [myid:] - INFO  [main:ZooKeeperServer@815] 
- 
tickTime set to 2000
2018-09-03 10:32:39,571 [myid:] - INFO  [main:ZooKeeperServer@824] 
- 
minSessionTimeout set to -1
2018-09-03 10:32:39,571 [myid:] - INFO  [main:ZooKeeperServer@833] 
- 
maxSessionTimeout set to -1
2018-09-03 10:32:39,578 [myid:] - INFO  
[main:NIOServerCnxnFactory@89] 
- binding to port 0.0.0.0/0.0.0.0:2181 

Edit: starting logs from kafka container:

Excluding KAFKA_HOME from broker config
[Configuring] 'advertised.host.name' in 
'/opt/kafka/config/server.properties'
[Configuring] 'port' in '/opt/kafka/config/server.properties'
[Configuring] 'broker.id' in '/opt/kafka/config/server.properties'
Excluding KAFKA_VERSION from broker config
[Configuring] 'zookeeper.connect' in 
'/opt/kafka/config/server.properties'
[Configuring] 'log.dirs' in '/opt/kafka/config/server.properties'
[Configuring] 'zookeeper.connect.timeout.ms' in 
'/opt/kafka/config/server.properties'
 [2018-09-05 10:47:22,036] INFO Registered 
kafka:type=kafka.Log4jController MBean 
(kafka.utils.Log4jControllerRegistration$) 
[2018-09-05 10:47:23,145] INFO starting (kafka.server.KafkaServer)
[2018-09-05 10:47:23,148] INFO Connecting to zookeeper on 
zookeeper:2181 (kafka.server.KafkaServer)
[2018-09-05 10:47:23,288] INFO [ZooKeeperClient] Initializing a new 
session to zookeeper:2181. (kafka.zookeeper.ZooKeeperClient)
[2018-09-05 10:47:23,300] INFO Client 
environment:zookeeper.version=3.4.13- 
2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 00:39 
GMT (org.apache.zookeeper.ZooKeeper)
[2018-09-05 10:47:23,300] INFO Client environment:host.name=kafka 
-757dc6c47b-zpzfz (org.apache.zookeeper.ZooKeeper)
[2018-09-05 10:47:23,300] INFO Client 
environment:java.version=1.8.0_171 (org.apache.zookeeper.ZooKeeper)
[2018-09-05 10:47:23,301] INFO Client 
environment:java.vendor=Oracle Corporation 
(org.apache.zookeeper.ZooKeeper)
[2018-09-05 10:47:23,301] INFO Client 
environment:java.home=/usr/lib/jvm/java-1.8-openjdk/jre 
(org.apache.zookeeper.ZooKeeper)
[2018-09-05 10:47:23,301] INFO Client 
environment:java.class.path=/opt/kafka/bin/../libs/activation- 
1.1.1.jar:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0- 
b42.jar:/opt/kafka/bin/../libs/argparse4j- 
0.7.0.jar:/opt/kafka/bin/../libs/audience-annotations- 
0.5.0.jar:/opt/kafka/bin/../libs/commons-lang3- 
3.5.jar:/opt/kafka/bin/../libs/connect-api- 
2.0.0.jar:/opt/kafka/bin/../libs/connect-basic-auth-extension- 
2.0.0.jar:/opt/kafka/bin/../libs/connect-file- 
2.0.0.jar:/opt/kafka/bin/../libs/connect-json- 
2.0.0.jar:/opt/kafka/bin/../libs/connect-runtime- 
2.0.0.jar:/opt/kafka/bin/../libs/connect-transforms- 
2.0.0.jar:/opt/kafka/bin/../libs/guava- 
20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0- 
b42.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0- 
b42.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0- 
b42.jar:/opt/kafka/bin/../libs/jackson-annotations- 
2.9.6.jar:/opt/kafka/bin/../libs/jackson-core- 
2.9.6.jar:/opt/kafka/bin/../libs/jackson-databind- 
2.9.6.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider- 
2.9.6.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations- 
CR2.jar:/opt/kafka/bin/../libs/javax.annotation-api- 
1.2.jar:/opt/kafka/bin/../libs/javax.inject- 
1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0- 
b42.jar:/opt/kafka/bin/../libs/javax.servlet-api- 
3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api- 
2.1.jar:/opt/kafka/bin/../libs/jaxb-api- 
2.3.0.jar:/opt/kafka/bin/../libs/jersey-client- 
2.27.jar:/opt/kafka/bin/../libs/jersey-common- 
2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet 
-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core- 
2.27.jar:/opt/kafka/bin/../libs/jersey-hk2- 
2.27.jar:/opt/kafka/bin/../libs/jersey-media-jaxb- 
2.27.jar:/opt/kafka/bin/../libs/jersey-server 
-2.27.jar:/opt/kafka/bin/../libs/jetty-client 
-9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-continuation- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-http- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-io- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-security- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-server- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-servlet- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-servlets- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jetty-util- 
9.4.11.v20180605.jar:/opt/kafka/bin/../libs/jopt-simple- 
5.0.4.jar:/opt/kafka/bin/../libs/kafka-clients- 
2.0.0.jar:/opt/kafka/bin/../libs/kafka-log4j-appender- 
2.0.0.jar:/opt/kafka/bin/../libs/kafka-streams- 
2.0.0.jar:/opt/kafka/bin/../libs/kafka-streams-examples- 
2.0.0.jar:/opt/kafka/bin/../libs/kafka-streams-scala_2.11- 
2.0.0.jar:/opt/kafka/bin/../libs/kafka-streams-test-utils- 
2.0.0.jar:/opt/kafka/bin/../libs/kafka-tools- 
2.0.0.jar:/opt/kafka/bin/../libs/kafka_2.11-2.0.0- 
sources.jar:/opt/kafka/bin/../libs/kafka_2.11-2 
 .0.0.jar:/opt/kafka/bin/../libs/log4j 
1.2.17.jar:/opt/kafka/bin/../libs/lz4-java- 
1.4.1.jar:/opt/kafka/bin/../libs/maven-artifact- 
3.5.3.jar:/opt/kafka/bin/../libs/metrics-core- 
2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator- 
1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils- 
3.1.0.jar:/opt/kafka/bin/../libs/reflections- 
0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni- 
5.7.3.jar:/opt/kafka/bin/../libs/scala-library- 
2.11.12.jar:/opt/kafka/bin/../libs/scala-logging_2.11- 
3.9.0.jar:/opt/kafka/bin/../libs/scala-reflect- 
2.11.12.jar:/opt/kafka/bin/../libs/slf4j-api- 
1.7.25.jar:/opt/kafka/bin/../libs/slf4j-log4j12- 
1.7.25.jar:/opt/kafka/bin/../libs/snappy-java- 
1.1.7.1.jar:/opt/kafka/bin/../libs/validation-api- 
1.1.0.Final.jar:/opt/kafka/bin/../libs/zkclient- 
0.10.jar:/opt/kafka/bin/../libs/zookeeper-3.4.13.jar 
(org.apache.zookeeper.ZooKeeper) 

output for kubectl get svc -o wide is as follows:

NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE       SELECTOR
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP    50m       <none>
zookeeper    ClusterIP   10.98.180.138   <none>        2181/TCP   48m       io.kompose.service=zookeeper

output from kubectl get pods -o wide:

NAME                       READY     STATUS             RESTARTS   AGE       IP           NODE
kafka-757dc6c47b-zpzfz     0/1       CrashLoopBackOff   15         1h        10.32.0.17   administrator-thinkpad-l480
zookeeper-7594d99b-784n9   1/1       Running            0          1h        10.32.0.19   administrator-thinkpad-l480

Edit: output from kubectl describe pod kafka-757dc6c47b-zpzfz:

Name:           kafka-757dc6c47b-zpzfz
Namespace:      default
Node:           administrator-thinkpad-l480/10.11.17.86
Start Time:     Wed, 05 Sep 2018 16:17:06 +0530
Labels:         io.kompose.service=kafka
            pod-template-hash=3138727036
Annotations:    <none>
Status:         Running
IP:             10.32.0.17
Controlled By:  ReplicaSet/kafka-757dc6c47b
Containers:
  kafka:
   Container ID:docker://2bdc06d876ae23437c61f4e95539a67903cdb61e88fd9c68377b47c7705293a3
    Image:          wurstmeister/kafka
    Image ID:       docker-pullable://wurstmeister/kafka@sha256:2e3ff64e70ea983530f590282f36991c0a1b105350510f53cc3d1a0279b83c28
    Port:           9092/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 05 Sep 2018 17:29:06 +0530
      Finished:     Wed, 05 Sep 2018 17:29:14 +0530
    Ready:          False
    Restart Count:  18
    Environment:
      KAFKA_ADVERTISED_HOST_NAME:          kafka
      KAFKA_ZOOKEEPER_CONNECT:             zookeeper:2181
      KAFKA_PORT:                          9092
      KAFKA_ZOOKEEPER_CONNECT_TIMEOUT_MS:  160000
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-nhb9z (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-nhb9z:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-nhb9z
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
             node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason   Age                From                                  Message
  ----     ------   ----               ----                                  -------
  Warning  BackOff  3m (x293 over 1h)  kubelet, administrator-thinkpad-l480  Back-off restarting failed container

Solution

  • What could be the reason for this ? and solutions ?

    The reason is hidden behind following log line:

    INFO Connecting to zookeeper on zookeeper:2181 (kafka.server.KafkaServer)
    

    Kafka is searching for zookeeper while it should search for your service name which is zookeeper-svc.

    The solution is then straightforward: to either rename your zookeeper service from zookeeper-svc to zookeeper in your zookeeper-service.yaml like so:

    apiVersion: v1
    kind: Service
    metadata:
      annotations:
        kompose.cmd: kompose convert
        kompose.version: 1.1.0 (36652f6)
      creationTimestamp: null
      labels:
        io.kompose.service: zookeeper
      name: zookeeper
    spec:
      ports:
      - name: "2181"
        port: 2181
        targetPort: 2181
      selector:
        io.kompose.service: zookeeper
    status:
      loadBalancer: {}
    

    or, more clean, to keep service name as it is and reconfigure Kafka to search for zookeeper-svc instead of zookeeper.

    Note: since your minimal example was enough to reproduce the issue (thank you, it is nice to see working minimal example!), although pod is up and running (not in error state) it still shows: java.io.IOException: Can't resolve address: kafka:9092 which is tied to another issue (service not covering Kafka deployment) and is out of the scope of this question, just to let you know.