I scheduled a coordinator which initiated many individual workflows. This was a backfill coordinator, with both startdate and enddate in the past.
A small percentage of these jobs failed due to temporary issues with the input datasets, and now I need to re-run those workflows (without re-running the successful workflows). These unsuccessful workflows have a variety of statuses: KILLED, FAILED, and SUSPENDED.
What is the best way to do this?
I ended up writing a bash script to do this. I won't copy the whole script here, but this was the general outline:
First, parse the output of oozie job -info
to get a list of actions with a given status for a given coordinator:
actions=$(oozie job -info $oozie_coord -filter status=$status -len 1000 |
grep "\-C@" |
awk '{print $1}' |
sed -n "s/^.*@\([0-9]*\).*$/\1/p")
Then loop over these actions and issue rerun commands:
while read -r action; do
oozie job -rerun $oozie_coord -action $action