dockerdocker-swarmdocker-networkdocker-swarm-mode

Unable to access ports from services across nodes in overlay network in swarm mode


I use the following compose file for stack deployment

version: '3.8'
x-deploy: &Deploy
  replicas: 1
  placement: &DeployPlacement
    max_replicas_per_node: 1
  restart_policy:
    max_attempts: 15
    window: 60s
  resources: &DeployResources
    reservations: &DeployResourcesReservations
      cpus: '0.05'
      memory: 10M
services:
  serv1:
    image: alpine
    networks:
      - test_nw
    deploy:
      <<: *Deploy
    entrypoint: ["tail", "-f", "/dev/null"]
  serv2:
    image: nginx
    networks:
      - test_nw
    deploy:
      <<: *Deploy
      placement:
        <<: *DeployPlacement
        constraints:
          - "node.role!=manager"
    expose: # deprecated, but I leave it here anyway
      - "80"
networks:
  test_nw:
    name: test_nw
    driver: overlay

For the sake of convenience, I'll use test_serv1 running via container in host1 and test_serv2 running via container2 in host2 for the rest of this port since actual host and container names keep changing for me.

When I get into the shell of test_serv1, the following happens when I ping serv2:

ubuntu@host1:~$ sudo docker exec -it test_serv1.1.container1 ash
/ # ping serv2
PING serv2 (10.0.7.5): 56 data bytes
64 bytes from 10.0.7.5: seq=0 ttl=64 time=0.084 ms

However, the ip of container2 as indicated while inspecting container2 is 10.0.7.6

ubuntu@host2:~$ sudo docker inspect test_serv2.1.container2
[
    {
****************
        "NetworkSettings": {
            "Bridge": "",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "80/tcp": null
            },
****************
            "Networks": {
                "test_nw": {
                    "IPAMConfig": {
                        "IPv4Address": "10.0.7.6"
                    },
                    "Links": null,
                    "Aliases": [
                        "80c06bb29a42"
                    ],
                    "NetworkID": "sp56aiqxnt56yglsd8mc1zqpv",
                    "EndpointID": "dac52f1d7fa148f5acac20f89d6b709193b3c11fc90201424cd052785121e706",
                    "Gateway": "",
                    "IPAddress": "10.0.7.6",
                    "IPPrefixLen": 24,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:0a:00:07:06",
****************
            }
        }
    }
]

I can see that container2 is listening on port 80 on all interfaces and by itself can ping both 10.0.7.5 and 10.0.7.6 (!!), and can access port 80 on both ips (!!).

ubuntu@host2:~$ sudo docker exec -it test_serv2.1.container2 bash
root@80c06bb29a42:/# ping 10.0.7.5
PING 10.0.7.5 (10.0.7.5) 56(84) bytes of data.
64 bytes from 10.0.7.5: icmp_seq=1 ttl=64 time=0.093 ms
64 bytes from 10.0.7.5: icmp_seq=2 ttl=64 time=0.094 ms
^C
--- 10.0.7.5 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 8ms
rtt min/avg/max/mdev = 0.093/0.093/0.094/0.009 ms
root@80c06bb29a42:/# ping 10.0.7.6
PING 10.0.7.6 (10.0.7.6) 56(84) bytes of data.
64 bytes from 10.0.7.6: icmp_seq=1 ttl=64 time=0.035 ms
64 bytes from 10.0.7.6: icmp_seq=2 ttl=64 time=0.059 ms
64 bytes from 10.0.7.6: icmp_seq=3 ttl=64 time=0.053 ms
^C
--- 10.0.7.6 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 50ms
rtt min/avg/max/mdev = 0.035/0.049/0.059/0.010 ms
root@80c06bb29a42:/# netstat -tuplen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name    
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      0          33110      1/nginx: master pro 
tcp        0      0 127.0.0.11:35491        0.0.0.0:*               LISTEN      0          32855      -                   
tcp6       0      0 :::80                   :::*                    LISTEN      0          33111      1/nginx: master pro 
udp        0      0 127.0.0.11:43477        0.0.0.0:*                           0          32854      -                   
root@80c06bb29a42:/# curl 10.0.7.5:80
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
root@80c06bb29a42:/# curl 10.0.7.6:80
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
root@80c06bb29a42:/# 

However, when I try the following from container1, I simply want to throw my laptop at a wall since I am unable to figure out how no one else faced such an issue and/or posted such a question :/

ubuntu@host1:~$ sudo docker exec -it test_serv1.1.container1 ash
/ # ping serv2
PING serv2 (10.0.7.5): 56 data bytes
64 bytes from 10.0.7.5: seq=0 ttl=64 time=0.084 ms
64 bytes from 10.0.7.5: seq=1 ttl=64 time=0.086 ms
^C
--- serv2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.084/0.085/0.086 ms
/ # curl serv2:80
^C
/ # curl --max-time 10 serv2:80
curl: (28) Connection timed out after 10001 milliseconds
/ # ping test_serv2
PING test_serv2 (10.0.7.5): 56 data bytes
64 bytes from 10.0.7.5: seq=0 ttl=64 time=0.071 ms
64 bytes from 10.0.7.5: seq=1 ttl=64 time=0.064 ms
64 bytes from 10.0.7.5: seq=2 ttl=64 time=0.125 ms
^C
--- test_serv2 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.064/0.086/0.125 ms
/ # curl --max-time 10 test_serv2:80
curl: (28) Connection timed out after 10001 milliseconds
/ # ping 10.0.7.6
PING 10.0.7.6 (10.0.7.6): 56 data bytes
^C
--- 10.0.7.6 ping statistics ---
87 packets transmitted, 0 packets received, 100% packet loss
/ # curl --max-time 10 10.0.7.6:80
curl: (28) Connection timed out after 10001 milliseconds
/ # 

I have checked that all the docker ports (TCP 2376, 2377, 7946, 80 and UDP 7946, 4789) are open on both nodes.

What is going on wrong here?? Any help truly appreciated!


Solution

  • I'm posting this for someone who might come looking since there is no answer yet.

    A few things to consider (even though it is all mentioned in the question):

    1. Please ensure all ports are open once again. Check iptables thoroughly even though you had set it once. Docker engine seems to change the configuration and at times leave it in an unusable state if you open the ports after docker had started (restarting won't fix it, you need to hard stop -> reset iptables -> start docker ce)
    2. Ensure your machine's local IP addresses are not conflicting. This is big deal. While I am unable to describe it, you may try to understand various classes of IP and see if there is any conflict.
    3. Probably the most trivial, but almost always excluded instruction: Remember to ALWAYS init or join a swarm with both --advertise-addr and --listen-addr. The --advertise-addr should be a public-facing IP address (even if not internet facing, it is the IP address that the other hosts use to reach this host). The --listen-addr is not documented well enough, but this must be the IP of the interface to which docker should bind to.

    Having gone through the above, please note that AWS Ec2 does not play well with cross-provider hosts. If you have machines spread across providers (say, IBM, Azure, GCP etc.), Ec2 plays spoil-sport there. I'm very curious on how it is done (has to be a low level network infringement), but I've spent considerable amount of time trying to get it work and it wouldn't.