bashshellzfs

Kill 'zfs send' command in non-interactive bash pipeline after finding keyword in stream


I need code that reads data from a zfs send command piped into zstream dump -v to extract crypt_keydata from it. The crypt_keydata are a couple lines not at the start, but somewhere in the beginning of the send datastream, which should be stored in a variable. The goal is to compare remote and local crypt_keydata to check if the remote key has not been changed before pulling in a snapshot for backup purposes.

I started with:

crypt_keydata_backup=$(stdbuf -oL zfs send -w -p "${backup_snapshot}" | stdbuf -oL zstreamdump -d | stdbuf -oL awk '/end crypt_keydata/{exit}1' | stdbuf -oL sed -n '/crypt_keydata/,$ {s/^[ \t]*//; p}')

This works very well in interactive bash/CLI (not sure what the correct wording is). However, when the script with the above code snippet is invoked by systemd then the zfs send command is never terminated, and thus the disks are at 100% read until the zfs send command is done - which for a large dataset can take a LONG time.

A (modified) solution suggested by supreme overlord ChatGPT is given below:

crypt_keydata_backup=""

while IFS= read -r line; do
    crypt_keydata_backup+="${line}"$'\n'
    if [[ "${line}" == *"end crypt_keydata"* ]]; then
        kill -SIGTERM "$(cat /tmp/sub_proc.pid)" &>/dev/null
        rm -f /tmp/sub_proc.pid
        break
    fi
done< <(stdbuf -oL zfs send -w -p "${backup_snapshot}" | stdbuf -oL zstream dump -v & echo $! > /tmp/sub_proc.pid)

# Modify the saved data to only extract the crypt_keydata
crypt_keydata_backup=$(sed -n '/crypt_keydata/,$ {s/^[ \t]*//; p}' <<< "${crypt_keydata_backup}")

This works, when run either in cli or by systemd. I have two questions:

  1. Is this a correct/efficient/robust way of achieving my goal?
  2. I do not really understand WHY this works in both cli and systemd, whereas the first example only (seems to) works in cli.

Hopefully anyone has some pointers to further my understanding and/or code examples I can try.

Edit: the stdbuf -oL seems to speed up process termination in both cli and systemd. If any solution works (better) without it - no need to have it in.


Solution

  • If we split zfs send and zstream dump into two separate process substitutions, we can track the PID of each process individually.

    Note that the below uses syntax introduced with bash 4.3; it certainly will not work with bash 3.2 as shipped in MacOS.

    #!/usr/bin/env bash
    backup_snapshot=$1 # Need to get this from somewhere
    
    exec {zfs_send_fd}< <(exec stdbuf -o0 zfs send -w -p "$backup_snapshot")
    zfs_send_pid=$!
    
    exec {zstream_dump_fd}< <(exec stdbuf -oL zstream dump <&"${zfs_send_fd}")
    zstream_dump_pid=$!
    
    # close stdout from zfs send so zstream dump has the only handle
    exec {zfs_send_fd}>&-
    
    reading=0
    crypt_data=( )
    while IFS= read -r line; do
      if (( reading == 0 )) && [[ $line =~ crypt_keydata ]]; then
        reading=1
      fi
      if (( reading )); then
        crypt_data+=( "$line" )
        if [[ $line =~ 'end crypt_keydata' ]]; then
          kill "$zfs_send_pid" "$zstream_dump_pid"
          break
        fi
      fi
    done <&"${zstream_dump_fd}"
    
    # write collected data -- redirect this to a file or such if/as appropriate
    printf '%s\n' "${crypt_data[@]}"