bashshellposixsigterm

How to capture STDOUT, STDERR and process PID, without creating zombies


I'm wrapping a terraform binary in a script, as a part of an enterprise solution. Therefore I need to take care of:

Currently, the core construct of the script looks like this:

#!/bin/bash
...
...
terraform "$@" > >(tee "${STDOUT_LOG}") 2> >(tee "${STDERR_LOG}" >&2) & TF_PID="$!"
wait "$TF_PID"
EXIT_CODE="$?"
...
wait
exit "$EXIT_CODE"

This script is called several hundred times in one container. We've noticed it leaves zombie processes, the shells within which tee commands are executed.

Adding a general wait before exiting the script doesn't help, the shell won't wait for these child processes to be reaped. I couldn't read much about the internals of process substitution, would you have a hint what might be going on here?

EDIT:

> ps aux --forest
  11295 ?        S      0:00      |       \_ /bin/bash /home/jenkins/workspace/build-1629@tmp/terraform.sh init
  11376 ?        Sl     0:01      |       |   \_ terraform init
  11377 ?        S      0:00      |       |       \_ /bin/bash /home/jenkins/workspace/build-1629@tmp/terraform.sh init
  11379 ?        S      0:00      |       |       |   \_ /usr/bin/coreutils --coreutils-prog-shebang=tee /usr/bin/tee -a /home/jenkins/workspace/build-1629/src/vnet-01/stdout.log
  11378 ?        S      0:00      |       |       \_ /bin/bash /home/jenkins/workspace/build-1629@tmp/terraform.sh init
  11380 ?        S      0:00      |       |           \_ /usr/bin/coreutils --coreutils-prog-shebang=tee /usr/bin/tee -a /home/jenkins/workspace/build-1629/src/vnet-01/stderr.log

and after a few moments:

> ps aux
...
  11377 ?        Z      0:00 [terraform.sh] <defunct>
  11378 ?        Z      0:00 [terraform.sh] <defunct>
...

Solution

  • You can ignore process substitution and roll up your sleeves and do it yourself. That way all processes will be childs of the current process, so current process can track all lifetimes. Also, variables don't have to scream UPPERCASE. I think with coproc you could get away with one fifo less.

    {
      # setup fifo stderr and stdout fifos
      stdout_fifo=$(mktemp -u)
      stderr_fifo=$(mktemp -u)
      mkfifo "$stdout_fifo" "$stderr_fifo"
      trap 'rm "$stdout_fifo" "$stderr_fifo"' EXIT
    }
    {
      # start it
      terraform "$@" >"$stdout_fifo" 2>"$stderr_fifo" &
      tf_pid=$!
      tee "$STDOUT_LOG" <"$stdout_fifo" &
      tee_stdout_pid=$!
      tee "$STDERR_LOG" <"$stderr_fifo" &
      tee_stderr_pid=$!
    }
    {
      # wait for it
      wait "$tf_pid"
      tf_exit_code="$?"
      wait "$tee_stdout_pid"
      wait "$tee_stderr_pid"
    }