azure-service-fabricservice-fabric-on-premises

Service fabric cluster cant get node0 to rejoin 3 node cluster


We have a 3 node service fabric cluster, node 0 which was the one that we used to setup the cluster is working but not listed in the System ClusterManagerService and other ones, but is in the FailoverManagerService.

enter image description here

How can I add it back in as I'm stumped at the moment, spend most of the day on this an no wiser?

With no answers I am thinking I will just need to remove the cluster and then recreate it.


Solution

  • I was unable to recover it, so remove the cluster and recreated it, with the following commands in PowerShell on one of the nodes.

    Connect-ServiceFabricCluster
    

    Get the current configuration for the cluster

    Get-ServiceFabricClusterConfiguration > C:\temp\train_cluster_config_old.json
    

    With the config we can now remove the cluster

    Remove-ServiceFabricCluster -ClusterConfigFilePath train_cluster_config_old.json
    

    You may have to remove any left over node folders under C:\ProgramData\SF or the next steps will inform you that you need to remove them.

    Make sure you are happy with the config, then test it with the tools you will need on one node https://go.microsoft.com/fwlink/?LinkId=730690

    .\TestConfiguration.ps1 -ClusterConfigFilePath C:\temp\train_cluster_config_old.json -FabricRuntimePackagePath C:\temp\Microsoft.Azure.ServiceFabric.WindowsServer.8.1.321.9590\DeploymentRuntimePackages\MicrosoftAzureServiceFabric.8.1.321.9590.cab
    

    If that all succeeds then run the command that will create the cluster

    .\CreateServiceFabricCluster.ps1 -ClusterConfigFilePath C:\temp\train_cluster_config_old.json -FabricRuntimePackagePath C:\temp\Microsoft.Azure.ServiceFabric.WindowsServer.8.1.321.9590\DeploymentRuntimePackages\MicrosoftAzureServiceFabric.8.1.321.9590.cab
    

    Give it a few minutes to run and start up and then navigate to https://localhost:19080/Explorer/index.html on the nodes to makes sure its running.

    You will now need to deploy all your applications again as the cluster will be empty.