azure azure-service-fabric service-fabric-stateful

Running a Windows Service as a statefull service in Service Fabric

I have three .net programs currently running as windows services. We are migrating to Service Fabric and I have a few questions. Our intent is to migrate the services to StateFul service since we need to keep track of locations of files, batch size, etc. that are currently stored in an app.config file. So we can "lift and shift" the code from the onTimer event to the RunAsync as discussed in this stackoverflow question: How to Migrate Windows Service in Azure Service Fabric

However there are some questions I have about these services. Of course part of using SF is to have the applications in a reliable environment to keep these applications available as much as possible, so the first question is:

Should we only deploy the service to one node and use the reliable 
collection to maintain the state of the process should the node go down and 
have to be brought back up?  

Or, should we deploy the application to say 3 nodes and just have each 
application on their node check the reliable collection to see if another 
application is processing files and to wait? 
files?

The application will "awake" at a determined interval and look at a folder, if there are any files in the folder, it will process them. This could take from a couple of seconds to many minutes. So if the application on was on three nodes, it is entirely possible that the other two applications on their nodes would wake up to process files. If they could check a reliable dictionary to see if one of the other instances of the application is running the file processing, they would just wait until the next time they are needed.

I know this is vague, I am looking for input on whether to launch the application on multiple nodes or a single node?

Solution

In short: statefull services have partitioned data. So you will have at least one, and probably more than one, partition. For each partition a primary instance will be up and running serving requests or doing work. Then for each primary instance there will be some secundary instances that will take over when the primairy fails. More info here.

In the configuration of the service you specify the number of partitions and the replica count:

<Service Name="Processing">
<StatefulService ServiceTypeName="ProcessingType" TargetReplicaSetSize="[Processing_TargetReplicaSetSize]" MinReplicaSetSize="[Processing_MinReplicaSetSize]">
    <UniformInt64Partition PartitionCount="[Processing_PartitionCount]" LowKey="0" HighKey="25" />
</StatefulService>
</Service>

The primairy and secundairy instances (replica's) will be distributed over the cluster nodes so for example, when the node running the primairy instance goes down a replica on another node will take over.

There is more to it than what I have described but this is the basic idea behind it all.

So to answer your question: you should specify enough replica's on other nodes to gurantuee high availabilty.