I am trying to create a cluster adding 2 custom VMs.
I create the cluster by setting the name and defining the roles for each of the nodes (etcd, controlpane and worker), and afterwards execute the command in each of the nodes.
After several minutes waiting, I see the following error:
[[network] Host [X.Y.Z.14] is not able to connect to the following ports: [X.Y.Z.10:2379, X.Y.Z.10:2380]. Please check network policies and firewall rules]
These IP addresses are the IP addresses of the nodes being added to the cluster. Server IP is X.Y.Z.9 and has none of these roles.
All 3 VMs (server and work nodes) are freshly installed CentOS 7. I have done this setup with SELINUX enabled, but I have also tried disabling it for testing purposes on all 3 VMs, just to check if this was a problem with SELINUX and Rancher.
Am I missing a step? Where should I be looking into? I have checked the logs of the rancher server container, here is part of the log:
2019/12/02 12:10:26 [INFO] kontainerdriver rancherkubernetesengine stopped
2019/12/02 12:10:26 [ERROR] ClusterController c-mb7xc [cluster-provisioner-controller] failed with : [[network] Host [X.Y.Z.14] is not able to connect to the following ports: [X.Y.Z.10:2379, X.Y.Z.10:2380]. Please check network policies and firewall rules]
2019-12-02 12:13:26.885195 I | mvcc: store.index: compact 115706
2019-12-02 12:13:26.886955 I | mvcc: finished scheduled compaction at 115706 (took 1.379118ms)
2019/12/02 12:14:26 [INFO] Provisioning cluster [c-mb7xc]
2019/12/02 12:14:26 [INFO] Creating cluster [c-mb7xc]
2019/12/02 12:14:31 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:42728
2019/12/02 12:14:31 [ERROR] Cluster c-mb7xc previously failed to create
2019/12/02 12:14:31 [INFO] cluster [c-mb7xc] provisioning: Initiating Kubernetes cluster
2019/12/02 12:14:31 [INFO] cluster [c-mb7xc] provisioning: [certificates] Generating admin certificates and kubeconfig
2019/12/02 12:14:31 [INFO] cluster [c-mb7xc] provisioning: Successfully Deployed state file at [management-state/rke/rke-770316984/cluster.rkestate]
2019/12/02 12:14:31 [INFO] cluster [c-mb7xc] provisioning: Building Kubernetes cluster
2019/12/02 12:14:31 [INFO] cluster [c-mb7xc] provisioning: [dialer] Setup tunnel for host [X.Y.Z.14]
2019/12/02 12:14:31 [INFO] [network] Starting stopped container [rke-etcd-port-listener] on host [X.Y.Z.10]
2019/12/02 12:14:31 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.10], try #1
2019/12/02 12:14:31 [INFO] [network] Starting stopped container [rke-etcd-port-listener] on host [X.Y.Z.14]
2019/12/02 12:14:31 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.14], try #1
2019/12/02 12:14:31 [INFO] cluster [c-mb7xc] provisioning: [dialer] Setup tunnel for host [X.Y.Z.10]
2019/12/02 12:14:31 [INFO] cluster [c-mb7xc] provisioning: [network] Deploying port listener containers
2019/12/02 12:14:31 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.10]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (5a38613b1495ef436cd7842ade853e6f2a11948f5f00f0d2a0ff0d57e83aa115): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:14:31 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.10], try #2
2019/12/02 12:14:31 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.14]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (445b5b6cbaf4a2078f15d44741b91245d4f63288bb1ad3894787f9060ada4e33): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:14:31 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.14], try #2
2019/12/02 12:14:32 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.10]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (d5a2cfc270aab68cee979b2fe1705a2ff574ba167f0ad011d2626e4edc94ac01): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:14:32 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.10], try #3
2019/12/02 12:14:32 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.14]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (1e7463f5c50b7d967824a380695cbf6f73e1c8f13368c6e12712330e64d6a358): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:14:32 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.14], try #3
2019/12/02 12:14:32 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.10]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (81941519760f80c47c05f5a44c8076adfb796a6675201614931b51bbb7b63714): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:14:32 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.14]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (9fff51bada3afffc9c14a7c5ddf5f25e889b71a2d128dca8d7cda8c56fa7fed4): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:14:32 [INFO] cluster [c-mb7xc] provisioning: [network] Port listener containers deployed successfully
2019/12/02 12:14:32 [INFO] Image [rancher/rke-tools:v0.1.51] exists on host [X.Y.Z.14]
2019/12/02 12:14:32 [INFO] Image [rancher/rke-tools:v0.1.51] exists on host [X.Y.Z.10]
2019/12/02 12:14:32 [INFO] cluster [c-mb7xc] provisioning: [network] Running etcd <-> etcd port checks
2019/12/02 12:14:32 [INFO] Starting container [rke-port-checker] on host [X.Y.Z.14], try #1
2019/12/02 12:14:32 [INFO] cluster [c-mb7xc] provisioning: [network] Successfully started [rke-port-checker] container on host [X.Y.Z.14]
2019/12/02 12:14:32 [INFO] Removing container [rke-port-checker] on host [X.Y.Z.14], try #1
2019/12/02 12:14:32 [INFO] Starting container [rke-port-checker] on host [X.Y.Z.10], try #1
2019/12/02 12:14:32 [INFO] cluster [c-mb7xc] provisioning: [network] Successfully started [rke-port-checker] container on host [X.Y.Z.10]
2019/12/02 12:14:38 [INFO] Removing container [rke-port-checker] on host [X.Y.Z.10], try #1
2019/12/02 12:14:38 [ERROR] cluster [c-mb7xc] provisioning: [[network] Host [X.Y.Z.14] is not able to connect to the following ports: [X.Y.Z.10:2379, X.Y.Z.10:2380]. Please check network policies and firewall rules]
2019/12/02 12:14:38 [INFO] kontainerdriver rancherkubernetesengine stopped
2019/12/02 12:14:38 [ERROR] ClusterController c-mb7xc [cluster-provisioner-controller] failed with : [[network] Host [X.Y.Z.14] is not able to connect to the following ports: [X.Y.Z.10:2379, X.Y.Z.10:2380]. Please check network policies and firewall rules]
2019-12-02 12:18:26.889193 I | mvcc: store.index: compact 116351
2019-12-02 12:18:26.890642 I | mvcc: finished scheduled compaction at 116351 (took 1.10593ms)
2019/12/02 12:22:38 [INFO] Provisioning cluster [c-mb7xc]
2019/12/02 12:22:38 [INFO] Creating cluster [c-mb7xc]
2019/12/02 12:22:43 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:33176
2019/12/02 12:22:43 [ERROR] Cluster c-mb7xc previously failed to create
2019/12/02 12:22:43 [INFO] cluster [c-mb7xc] provisioning: Initiating Kubernetes cluster
2019/12/02 12:22:43 [INFO] cluster [c-mb7xc] provisioning: [certificates] Generating admin certificates and kubeconfig
2019/12/02 12:22:43 [INFO] cluster [c-mb7xc] provisioning: Successfully Deployed state file at [management-state/rke/rke-153618103/cluster.rkestate]
2019/12/02 12:22:43 [INFO] cluster [c-mb7xc] provisioning: Building Kubernetes cluster
2019/12/02 12:22:43 [INFO] cluster [c-mb7xc] provisioning: [dialer] Setup tunnel for host [X.Y.Z.10]
2019/12/02 12:22:43 [INFO] cluster [c-mb7xc] provisioning: [dialer] Setup tunnel for host [X.Y.Z.14]
2019/12/02 12:22:43 [INFO] [network] Starting stopped container [rke-etcd-port-listener] on host [X.Y.Z.14]
2019/12/02 12:22:43 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.14], try #1
2019/12/02 12:22:43 [INFO] [network] Starting stopped container [rke-etcd-port-listener] on host [X.Y.Z.10]
2019/12/02 12:22:43 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.10], try #1
2019/12/02 12:22:43 [INFO] cluster [c-mb7xc] provisioning: [network] Deploying port listener containers
2019/12/02 12:22:43 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.14]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (7ef490bf1c3963f131972836836d7f01acf0a7f9f808eede2cf19e57e4b3c62c): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:22:43 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.14], try #2
2019/12/02 12:22:43 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.10]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (f42e672d345f9468871fcc130c432885dde17b70bda4f2dc23d1f7f443ecac6e): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:22:43 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.10], try #2
2019/12/02 12:22:43 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.10]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (11c409cf4e33232e2c5d39ae60981620793ca531482a8f09677e7c3e47750df6): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:22:43 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.10], try #3
2019/12/02 12:22:43 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.14]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (26df18e96a9a328a390a2d3a832cf665c7ef455e46058b0748f0e63e6c356612): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:22:43 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.14], try #3
2019/12/02 12:22:44 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.10]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (142ef4546b5b9afb113ef7282970e84dce1131dce21e32caafde54d870838792): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:22:44 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.14]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (8e0eda16eb1e99088e4bd2dd3f5134bf6230fdc03dd10aac24c76e6d71826ac3): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:22:44 [INFO] cluster [c-mb7xc] provisioning: [network] Port listener containers deployed successfully
2019/12/02 12:22:44 [INFO] Image [rancher/rke-tools:v0.1.51] exists on host [X.Y.Z.14]
2019/12/02 12:22:44 [INFO] Image [rancher/rke-tools:v0.1.51] exists on host [X.Y.Z.10]
2019/12/02 12:22:44 [INFO] cluster [c-mb7xc] provisioning: [network] Running etcd <-> etcd port checks
2019/12/02 12:22:44 [INFO] Starting container [rke-port-checker] on host [X.Y.Z.14], try #1
2019/12/02 12:22:44 [INFO] cluster [c-mb7xc] provisioning: [network] Successfully started [rke-port-checker] container on host [X.Y.Z.14]
2019/12/02 12:22:44 [INFO] Starting container [rke-port-checker] on host [X.Y.Z.10], try #1
2019/12/02 12:22:44 [INFO] Removing container [rke-port-checker] on host [X.Y.Z.14], try #1
2019/12/02 12:22:44 [INFO] cluster [c-mb7xc] provisioning: [network] Successfully started [rke-port-checker] container on host [X.Y.Z.10]
2019/12/02 12:22:49 [INFO] Removing container [rke-port-checker] on host [X.Y.Z.10], try #1
2019/12/02 12:22:49 [ERROR] cluster [c-mb7xc] provisioning: [[network] Host [X.Y.Z.14] is not able to connect to the following ports: [X.Y.Z.10:2379, X.Y.Z.10:2380]. Please check network policies and firewall rules]
2019/12/02 12:22:49 [INFO] kontainerdriver rancherkubernetesengine stopped
2019/12/02 12:22:49 [ERROR] ClusterController c-mb7xc [cluster-provisioner-controller] failed with : [[network] Host [X.Y.Z.14] is not able to connect to the following ports: [X.Y.Z.10:2379, X.Y.Z.10:2380]. Please check network policies and firewall rules]
I was able to get past this issue after starting from scratch with Ubuntu and newer versions of Rancher.
I don't believe operating system to be the issue here, but there was a known problem in that rancher version.