I am upgrading an application on Service Fabric and one of the replicas is showing the following warning:
Unhealthy event: SourceId='System.RAP', Property='IStatefulServiceReplica.ChangeRole(S)Duration', HealthState='Warning', ConsiderWarningAsError=false. The api IStatefulServiceReplica.ChangeRole(S) on node _gtmsf1_0 is stuck. Start Time (UTC): 2018-03-21 15:49:54.326.
After some debugging, I suspect I'm not properly honoring a cancellation token. In the meantime, how do I safely force a restart of this stuck replica to get the service working again?
Partial results of Get-ServiceFabricDeployedReplica
:
...
ReplicaRole : ActiveSecondary
ReplicaStatus : Ready
ServiceTypeName : MarketServiceType
...
ServicePackageActivationId :
CodePackageName : Code
...
HostProcessId : 6180
ReconfigurationInformation : {
PreviousConfigurationRole : Primary
ReconfigurationPhase : Phase0
ReconfigurationType : SwapPrimary
ReconfigurationStartTimeUtc : 3/21/2018 3:49:54 PM
}
You might be able to pipe that directly to Restart-ServiceFabricReplica
. If that remains stuck, then you should be able to use Get-ServiceFabricDeployedCodePackage
and Restart-ServiceFabricDeployedCodePackage
to restart the surrounding process. Since Restart-ServiceFabricDeployedCodePackage
has options for selecting random packages to simulate failure, just be sure to target the specific code package you're interested in restarting.