I have an Oozie job that processes data incrementally. Going forward, I would like to run this job on an hourly basis to prepare the results as soon as possible. But to backfill old data, it would be faster to run sequential jobs processing a week's worth of data at a time.
Is it possible to have a single coordinator.xml
file that allows for both of these modes, and simply choose between them based on a flag specified ad-hoc when the job is scheduled?
In the parameters of the <coordinator-app>
tag in coordinator.xml
, there is a single frequency, which suggests that this is not possible, at least not in a natural way.
My eventual solution was to create the workflow at a given frequency (say, daily), and then create a second "backfill" workflow with a different frequency (weekly or monthly) that calls the original workflow as a sub-workflow.