cloud-foundrypcfpivotal-web-services

The 4 levels of High Availability in PCF, does BOSH handles failed instances or monit?


According to me, as it is mentioned in PCF's 4 levels of High Availability, when an instance(process) fails, the Monit should recognize it and shourd restart it. And then it'll just send the report to BOSH. But if the whole VM goes down, it's BOSH's responsibility to recognize and restart it.

With this belief I answered one question in : https://djitz.com/guides/pivotal-cloud-foundry-pcf-certification-exam-review-questions-and-answers-set-4-logging-scaling-and-high-availability/

Question and answer

According to me, the answer for this question should be option 3, but it says I'm wrong and answer should be option 2. Now I'm confused. So please help me if my belief is wrong.


Solution

  • BOSH is responsible for creating new instance for failed VM. I know that there is not much information available on internet for this but if you get chance, there is tutorial on pluralsight you can enroll. There instructor has explained high availability very well. But you can get high level idea from PCF documents as well.

    Process Monitoring PCF uses a BOSH agent, monit, to monitor the processes on the component VMs that work together to keep your applications running, such as nsync, BBS, and Cell Rep. If monit detects a failure, it restarts the process and notifies the BOSH agent on the VM. The BOSH agent notifies the BOSH Health Monitor, which triggers responders through plugins such as email notifications or paging.

    Resurrection for VMs BOSH detects if a VM is present by listening for heartbeat messages that are sent from the BOSH agent every 60 seconds. The BOSH Health Monitor listens for those heartbeats. When the Health Monitor finds that a VM is not responding, it passes an alert to the Resurrector component. If the Resurrector is enabled, it sends the IaaS a request to create a new VM instance to replace the one that failed.