kubernetes

draining a node - surge replica up and not down


When draining I node, I can specificy a pod-disruption-budget telling the cluster how far it can dip down with the replicas

https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#pdb-example

All the examples and configs point towards "when I have 2 replicas, I can tell k8s to allow it to go down to 1 and guarantee uptime"

What I have not found thus far is how I can allow the cluster to surge upwards.

Lets say I have 1 replica. I tolerate that I have multiple replicas during upgrades of the service:

I have a RollingUpdate strategy where I instruct the cluster to got +1 replicas (1 is running, another pod is added, we are now at 2 instances, when the new pod is ready, first instance is taken down)

spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

How can I tell the cluster to go with the same strategy when draining a node? Try first to get another pod on another node, then get that pod running first, only then tear down the pod on the node I want to drain

a) am I missing something? b) is there a reason this is not working for planned distruption, but is an available feature for upgrading an application?


Solution

  • You are not missing anything, this is just by default how Kubernetes handles upgrades and node drains differently. The RollingUpdate strategy with maxSurge works as by design, Kubernetes increases the number of replicas temporarily as part of the upgrade process. Meanwhile during a node drain, Kubernetes primarily focuses on evicting pods safely. While the PDB is there to ensure availability during disruption, it doesn't automatically scale up the replicas during the drain process.

    To achieve your desired behavior, manually scale your replicas up temporarily first before draining the node. Once the node is drained, scale the replicas back down.

    For further reference, you can also check related thread.