hadoopoozieoozie-coordinatoroozie-workflow

Oozie - run a workflow every day or every hour


I have a oozie workflow(hive_insertion.xml) that executes a .hive file, which inserts data into a table. The Oozie workflow is:

<workflow-app xmlns = "uri:oozie:workflow:0.4" name = "simple-Workflow">
   <start to = "Insert_into_treatment_costs_table " />

   <action name = "Insert_into_treatment_costs_table">
      <hive xmlns = "uri:oozie:hive-action:0.4">
         <job-tracker> ${jobTracker} </job-tracker>
         <name-node> ${nameNode} </name-node>
         <script>hdfs_path_of_script/treatment_insert.hive’</script>
      </hive>
<!-- what should happen on success--!>      
      <ok to = "end" />
<!-- what should happen on failure--!>  
      <error to = "kill_job" />
   </action>
<!-- this is what happens on failure --!>   
   <kill name = "kill_job">
      <message>Job failed</message>
   </kill>
<!-- this is what happens on success --!>       
   <end name = "end" />

</workflow-app>

This can be executed from directory which also has above 'hive_insertion.xml' file.

# sudo -u oozie oozie job –oozie

where do i make changes so this workflow executes at the end of every day.


Solution

  • You have to use the Oozie Coordinator for scheduling oozie workflows.

    To execute at the end of every day, use the EL function ${coord:endOfDays(1)}

    Embed your workflow definition within the coordinator action,

    <coordinator-app name="daily" frequency="${coord:endOfDays(1)}" start="${start}" end="${end}" 
                    timezone="UTC" xmlns="uri:oozie:coordinator:0.1">
      <action>
         <workflow>
           ...
         </workflow>
      </action>
    </coordinator-app>
    

    Note: Oozie coordinator also supports Cron Syntax. Use it for more flexible frequencies.