openstackcloudify

Error on bootstrap of Management VM


I am using Cloudify 2.7 with OpenStack Icehouse. In particularly, I have configured the cloud driver to bootstrap 2 Management VMs (numberOfManagementMachines 2).

Sometime, when I bootstrap the VMs I receive the following error:

cloudify@default> bootstrap-cloud --verbose openstack-icehouse-<project_name>
...
Starting agent and management processes:
[VM_Floating_IP] nohup gs-agent.sh gsa.global.lus 0 gsa.lus 1 gsa.gsc 0 gsa.global.gsm 0 gsa.gsm 1 gsa.global.esm 1 >/dev/null 2>&1
[VM_Floating_IP] STARTING CLOUDIFY MANAGEMENT
[VM_Floating_IP] .
[VM_Floating_IP] Discovered agent nic-address=177.86.0.3 lookup-groups=gigaspaces-Cloudify-2.7.1-ga.
[VM_Floating_IP] Detected LUS management process started by agent null  expected agent a0eec4e5-7fb0-4428-80e1-ec13a8b1c744
[VM_Floating_IP] Detected LUS management process started by agent a0eec4e5-7fb0-4428-80e1-ec13a8b1c744
[VM_Floating_IP] Detected GSM management process started by agent a0eec4e5-7fb0-4428-80e1-ec13a8b1c744
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] Waiting for Elastic Service Manager
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] .
[VM_Floating_IP] Waiting for Elastic Service Manager
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] .
[VM_Floating_IP] Waiting for Elastic Service Manager
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] .
[VM_Floating_IP] Waiting for Elastic Service Manager
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] .failure occurred while renewing an event lease: Operation failed. net.jini.core.lease.UnknownLeaseException: Unknown event id: 3
[VM_Floating_IP]        at com.sun.jini.reggie.GigaRegistrar.renewEventLeaseInt(GigaRegistrar.java:5494)
[VM_Floating_IP]        at com.sun.jini.reggie.GigaRegistrar.renewEventLeaseDo(GigaRegistrar.java:5475)
[VM_Floating_IP]        at com.sun.jini.reggie.GigaRegistrar.renewEventLease(GigaRegistrar.java:2836)
[VM_Floating_IP]        at com.sun.jini.reggie.RegistrarGigaspacesMethodinternalInvoke16.internalInvoke(Unknown Source)
[VM_Floating_IP]        at com.gigaspaces.internal.reflection.fast.AbstractMethod.invoke(AbstractMethod.java:41)
[VM_Floating_IP]        at com.gigaspaces.lrmi.LRMIRuntime.invoked(LRMIRuntime.java:464)
[VM_Floating_IP]        at com.gigaspaces.lrmi.nio.Pivot.consumeAndHandleRequest(Pivot.java:561)
[VM_Floating_IP]        at com.gigaspaces.lrmi.nio.Pivot.handleRequest(Pivot.java:662)
[VM_Floating_IP]        at com.gigaspaces.lrmi.nio.Pivot$ChannelEntryTask.run(Pivot.java:196)
[VM_Floating_IP]        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
[VM_Floating_IP]        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
[VM_Floating_IP]        at java.lang.Thread.run(Thread.java:662)
[VM_Floating_IP]
[VM_Floating_IP]
[VM_Floating_IP] Waiting for Elastic Service Manager
[VM_Floating_IP] Waiting for Management processes to start.
....
[VM_Floating_IP] ....Failed to add [Processing Unit Instance] with uid [8038e956-1ae2-4378-8bb1-e2055202c160]: Operation failed. java.rmi.ConnectException: Connect Failed to [NIO://177.86.0.3:7011/pid[4390]/164914896032_3_8060218823096628119_details[class org.openspaces.pu.container.servicegrid.PUServiceBeanImpl]]; nested exception is: 
[VM_Floating_IP]        java.net.SocketTimeoutException
...
[VM_Floating_IP] Failed to add [GSM] with uid [3c0e20e9-bf85-4d22-8ed6-3b387e690878]: Operation failed. java.rmi.ConnectException: Connect Failed to [NIO://177.86.0.3:7000/pid[4229]/154704895271_2_2245795805687723285_details[class com.gigaspaces.grid.gsm.GSMImpl]]; nested exception is:
[VM_Floating_IP]        java.net.SocketTimeoutException
...
[VM_Floating_IP] Failed to add GSC with uid [8070dabb-d80d-43c7-bd9c-1d2478f95710]: Operation failed. java.rmi.ConnectException: Connect Failed to [NIO://177.86.0.3:7011/pid[4390]/164914896020_2_8060218823096628119_details[class com.gigaspaces.grid.gsc.GSCImpl]]; nested exception is:
[VM_Floating_IP]        java.net.SocketTimeoutException
...
[VM_Floating_IP] Failed to add [GSA] with uid [a0eec4e5-7fb0-4428-80e1-ec13a8b1c744]: Operation failed. java.rmi.ConnectException: Connect Failed to [NIO://177.86.0.3:7002/pid[4086]/153569177936_2_8701370873164361474_details[class com.gigaspaces.grid.gsa.GSAImpl]]; nested exception is:
[VM_Floating_IP]        java.net.SocketTimeoutException
...
[VM_Floating_IP] Waiting for Management processes to start.
[VM_Floating_IP] Failed to connect to LUS on 177.86.0.3:4174, retry in 73096ms: Operation failed. java.net.ConnectException: Connection timed out
...
[VM_Floating_IP] .Failed to add [ESM] with uid [996c8898-897c-4416-a877-82efb22c7ea6]: Operation failed. java.rmi.ConnectException: Connect Failed to [NIO://177.86.0.3:7003/pid[4504]/172954418920_2_5475350805758957057_details[class org.openspaces.grid.esm.ESMImpl]]; nested exception is:
[VM_Floating_IP]        java.net.SocketTimeoutException

Can someone suggest to me any solution? Should I have to configure any timeout value?

Thanks.

------------------------Edited-------------------

I would add some information.

Each manager instance has 4VCPUs, 8GB RAM, 20GB Disk.

Each manager instance has the Security Groups created by Cloudify, that is:

cloudify-manager-cluster    

Egress  IPv4    Any -       0.0.0.0/0 (CIDR)    
Egress  IPv6    Any -       ::/0 (CIDR)

cloudify-manager-management

Egress  IPv4    Any -       0.0.0.0/0 (CIDR)    
Egress  IPv6    Any -       ::/0 (CIDR) 
Ingress IPv4    TCP 22      0.0.0.0/0 (CIDR)    
Ingress IPv4    TCP 4174    cfy-mngt-cluster    
Ingress IPv4    TCP 6666    cfy-mngt-cluster    
Ingress IPv4    TCP 7000    cfy-mngt-cluster    
Ingress IPv4    TCP 7001    cfy-mngt-cluster    
Ingress IPv4    TCP 7002    cfy-mngt-cluster    
Ingress IPv4    TCP 7003    cfy-mngt-cluster    
Ingress IPv4    TCP 7010 - 7110 cfy-mngt-cluster    
Ingress IPv4    TCP 8099    0.0.0.0/0 (CIDR)    
Ingress IPv4    TCP 8100    0.0.0.0/0 (CIDR)

Moreover, Cloudify creates a private net "cloudify-manager-Cloudify-Management-Network" with subnet 177.86.0.0/24, and for each VM it asks for a Floating IP.


Solution

  • The ESM is Cloudify's Orchestrator. Only one instance of it should be running at any one time. The error indicates that the boostrap process was expecting to find a running ESM, but did not find one. This seems to be related to communication errors between the manager instances - is it possible that the security groups defined for the manager do not open all ports between the managers?

    Security group/firewall configurations are the usual problem. It is also possible that the manager VM is too small - it should have at-least 4 GB Ram and 2 vCPUs.

    Please keep in mind that Cloudify 2.X has reached end-of-life and is no longer supported. You may want to check out Cloudify 3.