kuberneteslivenessprobe

kubernetes liveness probe exec command environment variables in an if statement not working


I am having difficulty getting a kubernetes livenessProbe exec command to work with environment variables. My goal is for the liveness probe to monitor memory usage on the pod as well as also perform an httpGet health check.

"If container memory usage exceeds 90% of the resource limits OR the http response code at /health fails then the probe should fail."

The liveness probe is configured as follows:


livenessProbe:
  exec:
    command:
    - sh
    - -c
    - |-
      "used=$(awk '{ print int($1/1.049e+6) }' /sys/fs/cgroup/memory/memory.usage_in_bytes);
      thresh=$(awk '{ print int( $1 / 1.049e+6 * 0.9 ) }' /sys/fs/cgroup/memory/memory.limit_in_bytes);
      health=$(curl -s -o /dev/null --write-out "%{http_code}" http://localhost:8080/health);
      if [[ ${used} -gt ${thresh} || ${health} -ne 200 ]]; then exit 1; fi"
  initialDelaySeconds: 240
  periodSeconds: 60
  failureThreshold: 3
  timeoutSeconds: 10

If I exec into the (ubuntu) pod and run these commands they all work fine and do the job.

But when deployed as a livenessProbe the pod is constantly failing with the following warning:

Events:                                                                                                                                                                                                               │
│   Type     Reason     Age                  From     Message                                                                                                                                                           │
│   ----     ------     ----                 ----     -------                                                                                                                                                           │
│   Warning  Unhealthy  14m (x60 over 159m)  kubelet  (combined from similar events): Liveness probe failed: sh: 4: used=1608;                                                                                          │
│ thresh=2249;                                                                                                                                                                                                          │
│ health=200;                                                                                                                                                                                                           │
│ if [[  -gt  ||  -ne 200 ]]; then exit 1; fi: not found

It looks as if the initial commands to probe memory and curl the health check endpoint all worked and populated environment variables but then those variable substitutions did not subsequently populate in the if statement so the probe never passes.

Any idea as to why? Or how this could be configured to work properly? I know it's a little bit convoluted. Thanks in advance.


Solution

  • It turns out that both answers by @Andrew McGuinness AND @OreOP were crucial to my final properly working solution which was:

      livenessProbe:
        exec:
          command:
          - /bin/bash
          - -c
          - |-
            used=$(awk '{ print int($1/1.049e+6) }' /sys/fs/cgroup/memory/memory.usage_in_bytes);
            thresh=$(awk '{ print int( $1 / 1.049e+6 * 0.9 ) }' /sys/fs/cgroup/memory/memory.limit_in_bytes);
            health=$(curl -s -o /dev/null --write-out "%{http_code}" http://localhost:8080/health);
            if [[ ${used} -gt ${thresh} || ${health} -ne 200 ]]; then exit 1; fi
        initialDelaySeconds: 240
        periodSeconds: 60
        failureThreshold: 3
        timeoutSeconds: 10
    

    I crucially needed Andrews advice about removing the quotes because I was already instucting yaml parser that this was a multi-line string. I think that was actually what I was asking. But @OreOP was absolutely correct about my confusion between bash and sh and which one would accept a double bracket [[ conditional ]] statement.

    By the way, I completely agree with both that this isn't ultimately the correct solution to the deeper problem at hand but for various other reasons my team has requested this patch as a temporary measure. The memory.limit_in_bytes in my script is actually referencing the resource limits set in my k8s deployment yaml.