oozieairflowluigiazkabanmesos-chronos

Scheduling spark jobs on a timely basis


Which is the recommended tool for scheduling Spark Jobs on a daily/weekly basis. 1) Oozie 2) Luigi 3) Azkaban 4) Chronos 5) Airflow

Thanks in advance.


Solution

  • Updating my previous answer from here: Suggestion for scheduling tool(s) for building hadoop based data pipelines

    Philosophy:

    Simpler pipelines are better than complex pipelines: Easier to create, easier to understand (especially when you didn’t create) and easier to debug/fix.

    When complex actions are needed you want to encapsulate them in a way that either completely succeeds or completely fails.

    If you can make it idempotent (running it again creates identical results) then that’s even better.