azure-service-fabricservice-fabric-statefulservice-fabric-actor

Why are my Service Fabric actors using more disk space than expected?


I am trying to understand why our actor service is using more disk space than expected. Our service currently contains around 80,000 actors distributed over 10 partitions. Each actor stores around 150Kb of state.

Looking at one (out of 10) nodes in our cluster, I would expect to see:

Another thing that I am trying to understand is the following:

So my questions are:

  1. Are my expectations correct?
  2. What could explain my observations?

Solution

  • Drilling down into one partition folder, I would expect to see just one replica id

    If things have been running for a while, I'd expect to see more than one. This is because of two things:

    1. Service Fabric keeps the information for replicas which failed around on the nodes for at least the ReplicaRestartWaitDuration. This is so that if local recovery is possible, there's still the information necessary on the node. If the replica just failed and can't be cleanly dropped for example, these sorts of files can accumulate. They can also be present if someone "ForceRemoved" individual replicas since that explicitly skips clean shutdown. This is part of why we generally don't recommend using this command in production environments.
    2. There's also a setting known as the "UserStandbyReplicaKeepDuration" which governs how long SF keeps old replicas around that are not needed right now, in case they are needed later (because it's usually cheaper to rebuild from partial state than full state).

      a. For example, say a node some replica was on failed and stayed down longer than the ReplicaRestartWaitDuration for that service. When this happens SF builds a replacement replica to get you back up to your TargetReplicaSetSize.

      b. Let's say that once that replica is built the node that failed comes back.

      c. If we're still within the StandbyReplicaKeepDuration for that replica, then SF will just leave it there on disk. If there's another failure in the meantime, SF will usually (depends on the Cluster Resource Manager settings, whether this node is a valid target etc.) pick this partial replica and rebuild the replacement from what remains on the drive.

      So you can see replicas from the past whose information is still being kept on the drives, but you generally shouldn't see anything older than the UserStandbyReplicaKeepDuration (by default a week). You can always reduce that duration in your cluster if you want.

    I would expect the folder to contain not much more than the size of 8000 actors taking up around 150 Kb each so around 1.14 Gb of data. Not as expected:The folder contains a file ActorStateStore and its size is 5.66Gb

    This is a bit more puzzling. Let't go back to the amount of stuff we expect to be on a given node. You say you have 80K actors. I presume you have a TargetReplicaSetSize of 3, so that's really more like 240K actors. Each actor is ~150K of state, so that's ~34 GB of state for the cluster. Per node then we'd expect 3.4 GB of state. (I think your original estimate forgot replication. If you've actually got a TargetReplicaSetSize of 1, then let me know and we can recalculate.)

    ~3.4gb is closer to your observation of ~5.7gb, but not quite close enough. Some other things to keep in mind:

    The growth stopped but the usage did not shrink.

    This could just be space that was allocated at the datastore level that isn't getting repacked/reclaimed. We'd need to look at what's actually still occupying space to understand the situation. Some of this depends on the actual persistence store (ESE/KVS vs. the dictionary based state provider). It's also possible that the ActorIds that your generating changed somehow as a part of your upgrade, so that the new code isn't able to reference the "old" ActorIds (but that feels unlikely).