I am using Cloudify 2.7 with OpenStack Icehouse. In particularly, I have configured the cloud driver to bootstrap 2 Management VMs (numberOfManagementMachines 2).
Sometime, when I bootstrap the VMs I receive the following error:
cloudify@default> bootstrap-cloud --verbose openstack-icehouse-<project_name>
...
Starting agent and management processes:
[VM_Floating_IP] nohup gs-agent.sh gsa.global.lus 0 gsa.lus 1 gsa.gsc 0 gsa.global.gsm 0 gsa.gsm 1 gsa.global.esm 1 >/dev/null 2>&1
[VM_Floating_IP] STARTING CLOUDIFY MANAGEMENT
[VM_Floating_IP] .
[VM_Floating_IP] Discovered agent nic-address=177.86.0.3 lookup-groups=gigaspaces-Cloudify-2.7.1-ga.
[VM_Floating_IP] Detected LUS management process started by agent null expected agent a0eec4e5-7fb0-4428-80e1-ec13a8b1c744
[VM_Floating_IP] Detected LUS management process started by agent a0eec4e5-7fb0-4428-80e1-ec13a8b1c744
[VM_Floating_IP] Detected GSM management process started by agent a0eec4e5-7fb0-4428-80e1-ec13a8b1c744
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] Waiting for Elastic Service Manager
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] .
[VM_Floating_IP] Waiting for Elastic Service Manager
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] .
[VM_Floating_IP] Waiting for Elastic Service Manager
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] .
[VM_Floating_IP] Waiting for Elastic Service Manager
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] .failure occurred while renewing an event lease: Operation failed. net.jini.core.lease.UnknownLeaseException: Unknown event id: 3
[VM_Floating_IP] at com.sun.jini.reggie.GigaRegistrar.renewEventLeaseInt(GigaRegistrar.java:5494)
[VM_Floating_IP] at com.sun.jini.reggie.GigaRegistrar.renewEventLeaseDo(GigaRegistrar.java:5475)
[VM_Floating_IP] at com.sun.jini.reggie.GigaRegistrar.renewEventLease(GigaRegistrar.java:2836)
[VM_Floating_IP] at com.sun.jini.reggie.RegistrarGigaspacesMethodinternalInvoke16.internalInvoke(Unknown Source)
[VM_Floating_IP] at com.gigaspaces.internal.reflection.fast.AbstractMethod.invoke(AbstractMethod.java:41)
[VM_Floating_IP] at com.gigaspaces.lrmi.LRMIRuntime.invoked(LRMIRuntime.java:464)
[VM_Floating_IP] at com.gigaspaces.lrmi.nio.Pivot.consumeAndHandleRequest(Pivot.java:561)
[VM_Floating_IP] at com.gigaspaces.lrmi.nio.Pivot.handleRequest(Pivot.java:662)
[VM_Floating_IP] at com.gigaspaces.lrmi.nio.Pivot$ChannelEntryTask.run(Pivot.java:196)
[VM_Floating_IP] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
[VM_Floating_IP] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
[VM_Floating_IP] at java.lang.Thread.run(Thread.java:662)
[VM_Floating_IP]
[VM_Floating_IP]
[VM_Floating_IP] Waiting for Elastic Service Manager
[VM_Floating_IP] Waiting for Management processes to start.
....
[VM_Floating_IP] ....Failed to add [Processing Unit Instance] with uid [8038e956-1ae2-4378-8bb1-e2055202c160]: Operation failed. java.rmi.ConnectException: Connect Failed to [NIO://177.86.0.3:7011/pid[4390]/164914896032_3_8060218823096628119_details[class org.openspaces.pu.container.servicegrid.PUServiceBeanImpl]]; nested exception is:
[VM_Floating_IP] java.net.SocketTimeoutException
...
[VM_Floating_IP] Failed to add [GSM] with uid [3c0e20e9-bf85-4d22-8ed6-3b387e690878]: Operation failed. java.rmi.ConnectException: Connect Failed to [NIO://177.86.0.3:7000/pid[4229]/154704895271_2_2245795805687723285_details[class com.gigaspaces.grid.gsm.GSMImpl]]; nested exception is:
[VM_Floating_IP] java.net.SocketTimeoutException
...
[VM_Floating_IP] Failed to add GSC with uid [8070dabb-d80d-43c7-bd9c-1d2478f95710]: Operation failed. java.rmi.ConnectException: Connect Failed to [NIO://177.86.0.3:7011/pid[4390]/164914896020_2_8060218823096628119_details[class com.gigaspaces.grid.gsc.GSCImpl]]; nested exception is:
[VM_Floating_IP] java.net.SocketTimeoutException
...
[VM_Floating_IP] Failed to add [GSA] with uid [a0eec4e5-7fb0-4428-80e1-ec13a8b1c744]: Operation failed. java.rmi.ConnectException: Connect Failed to [NIO://177.86.0.3:7002/pid[4086]/153569177936_2_8701370873164361474_details[class com.gigaspaces.grid.gsa.GSAImpl]]; nested exception is:
[VM_Floating_IP] java.net.SocketTimeoutException
...
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] Failed to connect to LUS on 177.86.0.3:4174, retry in 73096ms: Operation failed. java.net.ConnectException: Connection timed out
...
[VM_Floating_IP] .Failed to add [ESM] with uid [996c8898-897c-4416-a877-82efb22c7ea6]: Operation failed. java.rmi.ConnectException: Connect Failed to [NIO://177.86.0.3:7003/pid[4504]/172954418920_2_5475350805758957057_details[class org.openspaces.grid.esm.ESMImpl]]; nested exception is:
[VM_Floating_IP] java.net.SocketTimeoutException
Can someone suggest to me any solution? Should I have to configure any timeout value?
Thanks.
------------------------Edited-------------------
I would add some information.
Each manager instance has 4VCPUs, 8GB RAM, 20GB Disk.
Each manager instance has the Security Groups created by Cloudify, that is:
cloudify-manager-cluster
Egress IPv4 Any - 0.0.0.0/0 (CIDR)
Egress IPv6 Any - ::/0 (CIDR)
cloudify-manager-management
Egress IPv4 Any - 0.0.0.0/0 (CIDR)
Egress IPv6 Any - ::/0 (CIDR)
Ingress IPv4 TCP 22 0.0.0.0/0 (CIDR)
Ingress IPv4 TCP 4174 cfy-mngt-cluster
Ingress IPv4 TCP 6666 cfy-mngt-cluster
Ingress IPv4 TCP 7000 cfy-mngt-cluster
Ingress IPv4 TCP 7001 cfy-mngt-cluster
Ingress IPv4 TCP 7002 cfy-mngt-cluster
Ingress IPv4 TCP 7003 cfy-mngt-cluster
Ingress IPv4 TCP 7010 - 7110 cfy-mngt-cluster
Ingress IPv4 TCP 8099 0.0.0.0/0 (CIDR)
Ingress IPv4 TCP 8100 0.0.0.0/0 (CIDR)
Moreover, Cloudify creates a private net "cloudify-manager-Cloudify-Management-Network" with subnet 177.86.0.0/24, and for each VM it asks for a Floating IP.
The ESM is Cloudify's Orchestrator. Only one instance of it should be running at any one time. The error indicates that the boostrap process was expecting to find a running ESM, but did not find one. This seems to be related to communication errors between the manager instances - is it possible that the security groups defined for the manager do not open all ports between the managers?
Security group/firewall configurations are the usual problem. It is also possible that the manager VM is too small - it should have at-least 4 GB Ram and 2 vCPUs.
Please keep in mind that Cloudify 2.X has reached end-of-life and is no longer supported. You may want to check out Cloudify 3.