Note: All examples below seek a one-line serial execution.
Dependencies on Taskflow API can be easily made serial if they don't return data that is used afterwards:
t1() >> t2()
If the task T1 returning a value and task T2 using it, you can also link them like this:
return_value = t1()
t2(return_value)
However, if you have to mix and match returning statements, this is no longer clear:
t1() >>
returned_value = t2()
t3(returned_value)
will fail due to syntax error (>>
operator cannot be used before a returning-value task).
Also, this would work, but not generate the serial (t1 >> t2 >> t3) dag required:
t1()
returned_value = t2()
t3(returned_value)
since then t1 and t2/t3 will run in parallel.
A way to make this is to force t2 to use a returned value from t1, even if not needed:
returned_fake_t1 = t1()
returned_value_t2 = t2(returned_fake_t1)
t3(returned_value_t2)
But this is not elegant since it needs to change the logic. What is the idiomatic way in Taskflow API to express this? I was not able to find such a scenario on Airflow Documentation.
I've struggled with the same problem, in the end it was solved it as:
t1_res = t1()
t2_res = t2()
t3_res = t3(t2_res)
t1_res >> t2_res >> t3_res
Good thing is that you don't really need to return anything from t1() or to add a fake parameter to f2(). Kudos to this article and it's author: https://khashtamov.com/ru/airflow-taskflow-api/