sql-servermonitoringscom

SCOM 2019 - Windows Server 2012 Cluster monitoring issue


I'm testing the SCOM 2019 and observerd that the product is probably broken - cannot monitor Windows Server 2012 R2 Clusters. Whet I tried to add cluster nodes the whole SCOM 2019 failed, emails are not sending, configurations are not applying properly. There are information that some CAST is not valid but without any details (I presume some SQL data cannot be casted). SCOM 2019 is not able to discover cluster resources properly and add them to the Agentless Management. I tried to modify the .config file and extend Timeouts like to 300 (as described in some other foras) , changed the Compatibility Level of SCOM SQL 2019 database to lower (2012,2014,2016), reinstall SCOM agents on cluster nodes. Nothing works. In SCOM 2019 Health Service SQL Table there are NULLs for many columns of node cluster and the cluster resource like SQL cluster name are visible or not (randomly). Like a SCOM 2019 is not able to properly discover cluster resource. Looks like SCOM 2019 has been delivered to production as a broken product. I've contact Microsoft Support but on this moment they are not able to solve the issue. And they cannot prepare a hotfix for this on request as I'm not a PREMIER MS client :( Any ideas how to solve this issue are more than welcome.


Solution

  • I've solved the issue. The problem that SCOM 2019 is not able to monitor Windows Server 2012 R2 Windows CLusters (and maybe other versions of Windows Server clusters as well) is in fact not in the SCOM 2019 itself but in the BROKEN/INCOMPATIBLE "Windows Server Branch Cache Management Pack". I do not know why - what Branch Cache has in common with Windows Clusters - but thats how it looks. So if you observe that SCOM 2019 hanged and not send any emails, not apply any configuration while you added a cluster to monitor, and the cluster monitoring itself doesn't work properly then first "uninstall/delete all "Branch Cache" related management packs from SCOM 2019 (I had 2 of them - Windows Server Branch Cache and reporting Branch Cache MP), next stop the SCOM Agents on the cluster nodes, then delete the folders "Health Service State" manually (from C:\Program Files\Microsoft monitoring Agent/Agent), next start the SCOM Agent service (folder will be recreated automatically), next go to the SCOM Console\Operations Manager\Agents Health State, next choose the agents on the appropriate cluster nodes (one by one) and from the Task Pane click "Clear/Flush cachen ane agent health state". In a 10-20 minutes the cluster nodes will be properly visible in the SCOM 2019 monitoring and the cluster resource will be visible properly in the Agentless monitoring.