I have a work-flow that I fork into 3 actions.
<start to="PARALLEL_PROCESS_FORK"/>
<fork name="MY_FORK">
<path start="START_PARALLEL_PATH_1"/>
<path start="START_PARALLEL_PATH_2"/>
<path start="START_PARALLEL_PATH_3"/>
</fork>
The three paths start a series of actions, each of which can fail.
What I find trivial to do is to create the following DAG. After the join
other actions follow. The proble with the DAG below is that, if I reach a kill
node, for example at the top path, all other path will be also killed before reaching the join
.
However, this is not the desired flow. What I need is that, if an action at a parallel path fails, I need to kill only that path of execution but the other paths should continue until the join. For example, if action A2
fails, action A3
will be skipped, but C1
, C2
, C3
will be executed. The decision node after the join
will detect that an error happened and will terminate.
Do you know how I could achieve that?
Option 1: Using wf:actionExternalStatus
In this case, the idea is simple: use the external status of a node to determine what to do next.
The status of a node can be either RUNNING, KILLED, FAILED, SUCCEEDED, empty if the node has been skipped, or "FAILED/KILLED". So we need to check for the KILLED or FAILED statuses. Given the limitations of the default Oozie EL functions, we can use the following construct:
<decision name="check-if-action-failed">
<switch>
<case to="kill_A">
${replaceAll(wf:actionExternalStatus('A1'), '.*FAILED.*|.*KILLED.*', 'FAILED') eq 'FAILED'} or
${replaceAll(wf:actionExternalStatus('A3'), '.*FAILED.*|.*KILLED.*', 'FAILED') eq 'FAILED'}
</case>
<case to="kill_C">
${replaceAll(wf:actionExternalStatus('C1'), '.*FAILED.*|.*KILLED.*', 'FAILED') eq 'FAILED'} or
${replaceAll(wf:actionExternalStatus('C2'), '.*FAILED.*|.*KILLED.*', 'FAILED') eq 'FAILED'}
</case>
<default to="next-action"/>
</switch>
</decision>
Option 2: Using a shell-script
The workaround that I show below is to use a Shell script in the nodes that are KO
. These are the nodes where we want the parallel branch to step execution, but let the other parallel branches reach the join.
So in the image above we have KO_A
and KO_C
. This will invoke a shell script echo-ko.sh
that will simply do:
echo "STATUS=KO"
In Oozie terms these nodes will look like:
KO_A
<action name="KO_A" retry-max="3" retry-interval="2">
<shell xmlns="uri:oozie:shell-action:0.3">
<exec>echo-ko.sh</exec>
<file>\path\to\script\echo-ko.sh</file>
<capture-output/>
</shell>
<ok to="my-join"/>
<error to="some-fail-node"/>
</action>
Decision after the join
<decision name="check-branch-ko">
<switch>
<case to="kill_A">
${wf:actionData('KO_A')['STATUS'] eq 'KO'}
</case>
<!-- Other cases -->
<default to="..."/>
</switch>
</decision>