I have several hundred Google Apps Script projects and have a variety of Bash scripts for managing the projects using the clasp tool (a Node.js app). Many of the scripts require using clasp pull
to first pull the projects locally before taking some actions on the local files, so I have a script which loops through local clasp project folders and runs clasp pull
on each. The loop iterates through directories sequentially so if it takes 3-4 seconds to pull a project, it ends up taking 5-6 minutes to run it per 100 projects.
My goal is to be able to run the clasp pull
commands in parallel so that they all start at the same time, and to be able to know which projects were successfully pulled vs which projects failed to be pulled.
Given a directory structure like this:
├── project-1
│ ├── .clasp.json
│ ├── .claspignore
│ ├── _main.js
│ └── appsscript.json
├── project-2
│ ├── .clasp.json
│ ├── .claspignore
│ ├── _main.js
│ └── appsscript.json
├── project-3
│ ├── .clasp.json
│ ├── .claspignore
│ ├── _main.js
│ └── appsscript.json
└── pull_all.sh
And this pull_all.sh
Bash script:
#!/bin/bash
# use Node 14.17.5 to prevent "Error: Looks like you are offline." errors
# (see https://github.com/google/clasp/issues/872)
[ -s "/usr/local/opt/nvm/nvm.sh" ] && . "/usr/local/opt/nvm/nvm.sh"
nvm install 14.17.5
nvm use 14.17.5
find . -name '.clasp.json' |
while read file; do
(
cd "$(dirname "$file")"
project_dir_name="$(basename "$(pwd)")"
echo "Pulling project ($project_dir_name)"
clasp pull
) &
done
When running this script it outputs the line for "Pulling project" for each directory, then gives a shell prompt, implying that the script has finished executing. But then without the user doing anything, 3-4 seconds later it shows the output of all the clasp pull
commands (apparently running in parallel because some of the output of the commands are out of order/overlapping), then hangs, and does not give a new shell prompt. At this point I have to press ctrl+c to terminate the script.
The complete output ends up looking like this:
$ ./pull_all.sh
v14.17.5 is already installed.
Now using node v14.17.5 (npm v6.14.14)
Now using node v14.17.5 (npm v6.14.14)
Pulling project (project-3)
Pulling project (project-2)
Pulling project (project-1)
$
Cloned 2 files.
⠙ Pulling files…└─ appsscript.json
└─ _main.js
Cloned 2 files.
└─ _main.js
└─ appsscript.json
Cloned 2 files.
└─ _main.js
To force one of the scripts to fail, I can change the scriptId
to an invalid script ID in any of the .clasp.json
files. In this case I do see the expected output of:
Could not find script.
Did you provide the correct scriptId?
Are you logged in to the correct account with the script?
... but it's still mixed in with the rest of the output and it's not clear which project that came from.
How can I make it so that:
clasp pull
operation, referenced by the directory name of the project (where the .clasp.json
file was found).clasp pull
so the script only shows the success or failure result of each project (referenced by the directory name).Note: I've mentioned clasp pull
as an example command, but a valid solution would allow me to run any clasp command as a background process in a bash while loop, including, but not limited to clasp push
, clasp deploy
, etc.
I'd suggest the following solution:
#!/usr/bin/env bash
# use Node 14.17.5 to prevent "Error: Looks like you are offline." errors
# (see https://github.com/google/clasp/issues/872)
[ -s "/usr/local/opt/nvm/nvm.sh" ] && . "/usr/local/opt/nvm/nvm.sh"
nvm install 14.17.5
nvm use 14.17.5
# Check and process command line
if (( $# < 1 )); then
echo "Usage: $(basename "$0") ACTION [ARG]..."
exit 2
fi
action="$1"
args=("${@:2}")
# Define cleanup handler, create temporary log directory
trap '[[ -n "$(jobs -p)" ]] && kill -- -$$; [[ -n "${logdir}" ]] && rm -rf "${logdir}"' EXIT
logdir=$(mktemp -d)
# Start specified action for each project
declare -A pid_pro_map=() pid_log_map=()
readarray -t files < <(find . -name '.clasp.json' -printf "%P\n" | sort -V)
for file in "${files[@]}"; do
project=$(dirname "${file}")
logfile=$(mktemp -p "${logdir}")
( cd "${project}" && clasp "${action}" "${args[@]}" ) &>"${logfile}" &
pid=$!; pid_pro_map[${pid}]="${project}"; pid_log_map[${pid}]="${logfile}"
echo -e "Started action '\e[1m${action}\e[0m' for project '\e[1m${project}\e[0m' (pid ${pid})"
done
# Wait for background jobs to finish and report results
echo -e "\nWaiting for background jobs to finish...\n"
jobs_done=0; jobs_total=${#files[@]}
while true; do
wait -n -p pid; result=$?
[[ -z "${pid}" ]] && break
jobs_done=$((jobs_done + 1))
if (( ${result} == 0 )); then
echo -e "Action '\e[1m${action}\e[0m' for project '\e[1m${pid_pro_map[${pid}]}\e[0m' (pid ${pid}) (${jobs_done}/${jobs_total}): \e[1;32mSUCCESS\e[0m"
else
echo -e "Action '\e[1m${action}\e[0m' for project '\e[1m${pid_pro_map[${pid}]}\e[0m' (pid ${pid}) (${jobs_done}/${jobs_total}): \e[1;31mFAILURE\e[0m"
cat "${pid_log_map[${pid}]}"
fi
done
Features:
clasp
(e.g. pull
, push
, deploy
)clasp
is suppressed (but captured to be printed in case of failure)clasp
for further analysis in case of failure)<projects-done>/<projects-total>
)Requirements:
wait -p
, Bash >= 4.3 for wait -n
, Bash >= 4.0 for associative arrays)find ... -printf "%P\n"
; Possible workaround:
readarray -t files < <(find . -name '.clasp.json' | sort -V)
for file in "${files[@]}"; do
project=$(dirname "${file#'./'}")
Sample output:
In response to this comment, here is a possible tweak to limit the amount of concurrent background jobs being spawned:
# Start specified action for each project
max_jobs=25; poll_delay="0.1s"
declare -A pid_pro_map=() pid_log_map=()
readarray -t files < <(find . -name '.clasp.json' -printf "%P\n" | sort -V)
for file in "${files[@]}"; do
if (( ${max_jobs} > 0 )); then
while jobs=$(jobs -r -p | wc -l) && (( ${jobs} >= ${max_jobs} )); do
sleep "${poll_delay}"
done
fi
project=$(dirname "${file}")
logfile=$(mktemp -p "${logdir}")
( cd "${project}" && clasp "${action}" "${args[@]}" ) &>"${logfile}" &
pid=$!; pid_pro_map[${pid}]="${project}"; pid_log_map[${pid}]="${logfile}"
echo -e "Started action '\e[1m${action}\e[0m' for project '\e[1m${project}\e[0m' (pid ${pid})"
done
Additionally, this could be employed to cut the amount of background processes being spawned in half:
( cd "${project}" && exec clasp "${action}" "${args[@]}" ) &>"${logfile}" &
This will replace the subshell's process with clasp
, which should be perfectly fine as the subshell looses its usefulness right after executing cd
anyway.