kuberneteskubernetes-statefulset

What happens when we create stateful set with many replicas with one pvc in kubernetes?


Im new to kubernetes and this topic is confusing for me. I've learned that stateful set doesn't share the PV and each replica has it's own PV. On the other hand I saw the examples when one was using one pvc in stateful set with many replicas. So my question is what will happen then? As PVC to PV are bind 1:1 so one pvc can only bind to one pv, but each replica should have its own PV so how is it possible to have one pvc in stateful set in this scenario?


Solution

  • You should usually use a volume claim template with a StatefulSet. As you note in the question, this will create a new PersistentVolumeClaim (and a new PersistentVolume) for each replica. Data is not shared, except to the extent the container process knows how to replicate data between its replicas. If a StatefulSet Pod is deleted and recreated, it will come back with the same underlying PVC and the same data, even if it is recreated on a different Node.

    spec:
      volumeClaimTemplates:
        - metadata:
            name: data
          spec:
            accessModes: [ReadWriteOnce]
            resources:
              requests:
                storage: 1Gi
      template:
        spec:
          containers:
            - name: name
              volumeMounts:
                - name: data
                  mountPath: /data
    

    You're allowed to manually create a PVC and attach it to the StatefulSet Pods

    # not recommended -- one PVC shared across all replicas
    spec:
      template:
        spec:
          volumes:
            - name: data
              persistentVolumeClaim:
                claimName: manually-created-pvc
          containers:
            - name: name
              volumeMounts:
                - name: data
                  mountPath: /data
    

    but in this case the single PVC/PV will be shared across all of the replicas. This often doesn't work well: things like database containers have explicit checks that their storage isn't shared, and there is a range of concurrency problems that are possible doing this. This also can prevent pods from starting up since the volume types that are straightforward to get generally only support a ReadWriteOnce access mode; to get ReadWriteMany you need to additionally configure something like an NFS server outside the cluster.