amazon-ecsapache-zookeeperaws-service-connect

ZooKeeper with ECS ServiceConnect: ConnectionLossException: KeeperErrorCode = ConnectionLoss


I have a docker compose stack that includes ZooKeeper. It has worked beautifully for years.

  zoo:
    container_name: zoo
    image: public.ecr.aws/docker/library/zookeeper:3.9.3
    restart: unless-stopped
    stdin_open: true
    tty: true

I have Java and Ruby clients connect to ZooKeeper using zoo:2181 as a connection string.

I am now running this same container in AWS ECS. I am using a ServiceConnectConfiguration to make the container discoverable with the name "zoo".

enter image description here

My Ruby client seems to have no issue connecting to ZooKeeper.

My Java client is very unreliable if I use the ServiceConnect name.

jshell> try (ZooKeeper zk = new ZooKeeper(String.format("%s:%d", "zoo", 2181), 5000, null)){
   ...>   System.out.println(zk.getChildren("/",false));
   ...> }
[batches, zookeeper, batch-uuids, locks, jobs]

jshell> try (ZooKeeper zk = new ZooKeeper(String.format("%s:%d", "zoo", 2181), 5000, null)){
   ...>   System.out.println(zk.getChildren("/",false));
   ...> }
[batches, zookeeper, batch-uuids, locks, jobs]

jshell> try (ZooKeeper zk = new ZooKeeper(String.format("%s:%d", "zoo", 2181), 5000, null)){
   ...>   System.out.println(zk.getChildren("/",false));
   ...> }
|  Exception org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /
|        at KeeperException.create (KeeperException.java:101)
|        at KeeperException.create (KeeperException.java:53)
|        at ZooKeeper.getChildren (ZooKeeper.java:2366)
|        at ZooKeeper.getChildren (ZooKeeper.java:2393)
|        at (#9:2)

If I use the ip address for "zoo" from /etc/hosts, I have no trouble connecting.

jshell> try (ZooKeeper zk = new ZooKeeper(String.format("%s:%d", "127.255.0.10", 2181), 5000, null)){
   ...>   System.out.println(zk.getChildren("/",false));
   ...> }
[batches, zookeeper, batch-uuids, locks, jobs]

jshell> try (ZooKeeper zk = new ZooKeeper(String.format("%s:%d", "127.255.0.10", 2181), 5000, null)){
   ...>   System.out.println(zk.getChildren("/",false));
   ...> }
[batches, zookeeper, batch-uuids, locks, jobs]

jshell> try (ZooKeeper zk = new ZooKeeper(String.format("%s:%d", "127.255.0.10", 2181), 5000, null)){
   ...>   System.out.println(zk.getChildren("/",false));
   ...> }
[batches, zookeeper, batch-uuids, locks, jobs]

jshell> try (ZooKeeper zk = new ZooKeeper(String.format("%s:%d", "127.255.0.10", 2181), 5000, null)){
   ...>   System.out.println(zk.getChildren("/",false));
   ...> }
[batches, zookeeper, batch-uuids, locks, jobs]

I am unsure if I need to fix this in the ServiceConnect configuration or if I can configure my ZooKeeper service to to function more effectively in this environment.

Looking at the services in ECS, there seems to be sufficient memory or CPU for the running tasks.


Solution

  • Forcing traffic to IPv4 seems to have resolved this issue.

    bash-4.2# jshell -R -Djava.net.preferIPv4Stack=true
    |  Welcome to JShell -- Version 21.0.6
    |  For an introduction type: /help intro
    
    jshell> import org.apache.zookeeper.ZooKeeper;
    
    jshell> try (ZooKeeper zk = new ZooKeeper(String.format("%s:%d", "zoo", 2181), 5000, null)){
       ...>   System.out.println(zk.getChildren("/",false));
       ...> }
    [batches, zookeeper, batch-uuids, jobs]
    
    jshell> try (ZooKeeper zk = new ZooKeeper(String.format("%s:%d", "zoo", 2181), 5000, null)){
       ...>   System.out.println(zk.getChildren("/",false));
       ...> }
    [batches, zookeeper, batch-uuids, jobs]
    
    jshell> try (ZooKeeper zk = new ZooKeeper(String.format("%s:%d", "zoo", 2181), 5000, null)){
       ...>   System.out.println(zk.getChildren("/",false));
       ...> }
    [batches, zookeeper, batch-uuids, jobs]