kubernetes-helm kubernetes-pvc kubernetes-deployment

How to perform helm update on deployment with pvc and initContainer?

I am fairly new to helm and kubernetes so I'm not sure if this is a bug or I'm doing something wrong. I have looked everywhere for answer however before posting and can't find anything that answers my question.

I have a deployment which uses a persistent volume and an init container. I pass it values to let helm know if either the image for the init container has changed, or the main application container has changed.

Possibly relevant but probably not: I need to deploy one deployment for a range of web sources (which I call collectors). I don't know if this last part is relevant, but then if I did, I probably wouldn't be here.

When I run

helm upgrade --install my-release helm_chart/ --values values.yaml --set init_image_tag=$INIT_IMAGE_TAG --set image_tag=$IMAGE_TAG

The first time everything works fine. However, when I run it a second time, with INIT_IMAGE_TAG the same, but IMAGE_TAG changed

a) it tries to re initialise the pod
b) it fails to reinitialise the pod because it can't mount the volume

Expected behaviour:

a) don't re initialise the pod since the init container hasn't changed
b) mount the volume

My values.yaml just contains a list called collectors

My template is just:

{{ $env := .Release.Namespace }}
{{ $image_tag := .Values.image_tag }}
{{ $init_image_tag := .Values.init_image_tag }}
{{- range $colname := .Values.collectors }}


apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: {{ $colname }}-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ebs-sc
  resources:
    requests:
      storage: 10Gi

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ $colname }}-ingest
  labels:
    app: {{ $colname }}-ingest
spec:
  replicas: 1
  selector:
    matchLabels:
      app: {{ $colname }}-ingest
  template:
    metadata:
      labels:
        app: {{ $colname }}-ingest
    spec:
          fsGroup: 1000
      containers:
      - name: {{ $colname }}-main
        image: xxxxxxx.dkr.ecr.eu-west-1.amazonaws.com/main_image:{{ $image_tag }}
        env:
        - name: COLLECTOR
          value: {{ $colname }}
        volumeMounts:
        - name: storage
          mountPath: /home/my/dir
      initContainers:
      - name: {{ $colname }}-init
        image: xxxxxxx.dkr.ecr.eu-west-1.amazonaws.com/init_image:{{ $init_image_tag }}
        volumeMounts:
        - name: storage
          mountPath: /home/my/dir
        env:
        - name: COLLECTOR
          value: {{ $colname }}
      volumes:
      - name: storage
        persistentVolumeClaim:
          claimName: {{ $colname }}-claim
---

{{ end }}

Output of helm version: version.BuildInfo{Version:"v3.2.0-rc.1", GitCommit:"7bffac813db894e06d17bac91d14ea819b5c2310", GitTreeState:"clean", GoVersion:"go1.13.10"}

Output of kubectl version: Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:14:22Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.9-eks-f459c0", GitCommit:"f459c0672169dd35e77af56c24556530a05e9ab1", GitTreeState:"clean", BuildDate:"2020-03-18T04:24:17Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

Cloud Provider/Platform (AKS, GKE, Minikube etc.): EKS

Does anyone know if this is a bug or if I'm mis-using helm/kubernetes somehow?

Thanks

Solution

When you update a Deployment, it goes through a couple of steps:

The existing Pod(s) with the old pod spec are still running.
The Deployment controller starts a new Pod with the new pod spec.
It waits for that Pod to reach "Running" status.
It terminates an old Pod.
If there are multiple replicas, repeat until every Pod has been replaced.

The important detail here is that there is (intentionally) a state where both old and new pods are running.

In the example you show, you mount a PersistentVolumeClaim with a ReadWriteOnce access mode. This doesn't really work well with Deployments. While the old Pod is running, it owns the PVC mount, which will prevent the new Pod from starting up, which will prevent the Deployment from progressing. (This isn't really specific to Helm and isn't related to having an initContainer or not.)

There are a couple of options here:

Don't store data in a local volume. This is the best path, though it involves rearchitecting your application. Store data in a separate database container, if it's relational-type data (e.g., prefer a PostgreSQL container to SQLite in a volume); or if you have access to network storage like Amazon S3, keep things there. That completely avoids this problem and will let you run as many replicas as you need.
Use a ReadWriteMany volume. A persistent volume has an access mode. If you can declare the volume as ReadWriteMany then multiple pods can mount it and this scenario will work. Many of the more common volume types don't support this access mode, though (AWSElasticBlockStore and HostPath notably are only ReadWriteOnce).
Set the Deployment strategy to Recreate. You can configure how a Deployment manages updates. If you change to a Recreate strategy
```
  apiVersion: apps/v1
  kind: Deployment
  spec:
    strategy:
      type: Recreate
```
then the old Pods will be deleted first. This will break zero-downtime upgrades, but it will allow this specific case to proceed.