bashgoogle-apps-scriptparallel-processingbackground-processclasp

Run multiple Google Apps Script clasp commands in parallel using a Bash script


I have several hundred Google Apps Script projects and have a variety of Bash scripts for managing the projects using the clasp tool (a Node.js app). Many of the scripts require using clasp pull to first pull the projects locally before taking some actions on the local files, so I have a script which loops through local clasp project folders and runs clasp pull on each. The loop iterates through directories sequentially so if it takes 3-4 seconds to pull a project, it ends up taking 5-6 minutes to run it per 100 projects.

My goal is to be able to run the clasp pull commands in parallel so that they all start at the same time, and to be able to know which projects were successfully pulled vs which projects failed to be pulled.

Given a directory structure like this:

├── project-1
│   ├── .clasp.json
│   ├── .claspignore
│   ├── _main.js
│   └── appsscript.json
├── project-2
│   ├── .clasp.json
│   ├── .claspignore
│   ├── _main.js
│   └── appsscript.json
├── project-3
│   ├── .clasp.json
│   ├── .claspignore
│   ├── _main.js
│   └── appsscript.json
└── pull_all.sh

And this pull_all.sh Bash script:

#!/bin/bash

# use Node 14.17.5 to prevent "Error: Looks like you are offline." errors
# (see https://github.com/google/clasp/issues/872)
[ -s "/usr/local/opt/nvm/nvm.sh" ] && . "/usr/local/opt/nvm/nvm.sh"
nvm install 14.17.5
nvm use 14.17.5

find . -name '.clasp.json' | 
while read file; do
    (
        cd "$(dirname "$file")"
        project_dir_name="$(basename "$(pwd)")"
        echo "Pulling project ($project_dir_name)"
        clasp pull
    ) &
done

When running this script it outputs the line for "Pulling project" for each directory, then gives a shell prompt, implying that the script has finished executing. But then without the user doing anything, 3-4 seconds later it shows the output of all the clasp pull commands (apparently running in parallel because some of the output of the commands are out of order/overlapping), then hangs, and does not give a new shell prompt. At this point I have to press ctrl+c to terminate the script.

The complete output ends up looking like this:

$ ./pull_all.sh
v14.17.5 is already installed.
Now using node v14.17.5 (npm v6.14.14)
Now using node v14.17.5 (npm v6.14.14)
Pulling project (project-3)
Pulling project (project-2)
Pulling project (project-1)
$
Cloned 2 files.
⠙ Pulling files…└─ appsscript.json
└─ _main.js
Cloned 2 files.
└─ _main.js
└─ appsscript.json
Cloned 2 files.
└─ _main.js

To force one of the scripts to fail, I can change the scriptId to an invalid script ID in any of the .clasp.json files. In this case I do see the expected output of:

Could not find script.
Did you provide the correct scriptId?
Are you logged in to the correct account with the script?

... but it's still mixed in with the rest of the output and it's not clear which project that came from.

How can I make it so that:

  1. The script does not cause a new shell prompt to appear during the execution of the script.
  2. The script outputs a line indicating the success or failure of each clasp pull operation, referenced by the directory name of the project (where the .clasp.json file was found).
  3. Bonus: suppress the output of clasp pull so the script only shows the success or failure result of each project (referenced by the directory name).

Note: I've mentioned clasp pull as an example command, but a valid solution would allow me to run any clasp command as a background process in a bash while loop, including, but not limited to clasp push, clasp deploy, etc.


Solution

  • I'd suggest the following solution:

    #!/usr/bin/env bash
    
    # use Node 14.17.5 to prevent "Error: Looks like you are offline." errors
    # (see https://github.com/google/clasp/issues/872)
    [ -s "/usr/local/opt/nvm/nvm.sh" ] && . "/usr/local/opt/nvm/nvm.sh"
    nvm install 14.17.5
    nvm use 14.17.5
    
    # Check and process command line
    if (( $# < 1 )); then
        echo "Usage: $(basename "$0") ACTION [ARG]..."
        exit 2
    fi
    action="$1"
    args=("${@:2}")
    
    # Define cleanup handler, create temporary log directory
    trap '[[ -n "$(jobs -p)" ]] && kill -- -$$; [[ -n "${logdir}" ]] && rm -rf "${logdir}"' EXIT
    logdir=$(mktemp -d)
    
    # Start specified action for each project
    declare -A pid_pro_map=() pid_log_map=()
    readarray -t files < <(find . -name '.clasp.json' -printf "%P\n" | sort -V)
    for file in "${files[@]}"; do
        project=$(dirname "${file}")
        logfile=$(mktemp -p "${logdir}")
        ( cd "${project}" && clasp "${action}" "${args[@]}" ) &>"${logfile}" &
        pid=$!; pid_pro_map[${pid}]="${project}"; pid_log_map[${pid}]="${logfile}"
        echo -e "Started action '\e[1m${action}\e[0m' for project '\e[1m${project}\e[0m' (pid ${pid})"
    done
    
    # Wait for background jobs to finish and report results
    echo -e "\nWaiting for background jobs to finish...\n"
    jobs_done=0; jobs_total=${#files[@]}
    while true; do
        wait -n -p pid; result=$?
        [[ -z "${pid}" ]] && break
        jobs_done=$((jobs_done + 1))
        if (( ${result} == 0 )); then
            echo -e "Action '\e[1m${action}\e[0m' for project '\e[1m${pid_pro_map[${pid}]}\e[0m' (pid ${pid}) (${jobs_done}/${jobs_total}): \e[1;32mSUCCESS\e[0m"
        else
            echo -e "Action '\e[1m${action}\e[0m' for project '\e[1m${pid_pro_map[${pid}]}\e[0m' (pid ${pid}) (${jobs_done}/${jobs_total}): \e[1;31mFAILURE\e[0m"
            cat "${pid_log_map[${pid}]}"
        fi
    done
    

    Features:

    Requirements:

    Sample output:

    sample output


    In response to this comment, here is a possible tweak to limit the amount of concurrent background jobs being spawned:

    # Start specified action for each project
    max_jobs=25; poll_delay="0.1s"
    declare -A pid_pro_map=() pid_log_map=()
    readarray -t files < <(find . -name '.clasp.json' -printf "%P\n" | sort -V)
    for file in "${files[@]}"; do
        if (( ${max_jobs} > 0 )); then
            while jobs=$(jobs -r -p | wc -l) && (( ${jobs} >= ${max_jobs} )); do
                sleep "${poll_delay}"
            done
        fi
        project=$(dirname "${file}")
        logfile=$(mktemp -p "${logdir}")
        ( cd "${project}" && clasp "${action}" "${args[@]}" ) &>"${logfile}" &
        pid=$!; pid_pro_map[${pid}]="${project}"; pid_log_map[${pid}]="${logfile}"
        echo -e "Started action '\e[1m${action}\e[0m' for project '\e[1m${project}\e[0m' (pid ${pid})"
    done
    

    Additionally, this could be employed to cut the amount of background processes being spawned in half:

    ( cd "${project}" && exec clasp "${action}" "${args[@]}" ) &>"${logfile}" &
    

    This will replace the subshell's process with clasp, which should be perfectly fine as the subshell looses its usefulness right after executing cd anyway.