hadoopoozieoozie-coordinator

How to set up Oozie coordinator with a simple input event?


How do you set up an Oozie coordinator input event that can either be there or not? Please fill in the ??? for the following:

<coordinator-app name="${jobName}" frequency="${coord:days(1)}" start="${startTime}" end="${endTime}" timezone="${timezone}" xmlns="uri:oozie:coordinator:0.2">
    <controls>
        <timeout>-1</timeout>
        <concurrency>30</concurrency>
        <execution>FIFO</execution>
    </controls>

    <datasets>
        <dataset name="myData" frequency="???" initial-instance="???" timezone="UTC">
            <uri-template>/time-independent/path/that/may/or/maynot/be/there</uri-template>
        </dataset>
    </datasets>

    <input-events>
        <data-in name="myInput" dataset="myData">
            <instance>???</instance>
        </data-in>
    </input-events>

    <action>
        <workflow>
            <app-path>${myAppPath}</app-path>
            <configuration>
                <property>
                    <name>myInput</name>
                    <value>${coord:dataIn('myInput')}</value>
                </property>
            </configuration>
        </workflow>
    </action>

</coordinator-app>

Thanks, Alvaro


Solution

  • dataset frequency value will be ${coord:days(1)} as you want to schedule it for everyday.

    starttime will be the value when you want to start the job. eg. 2017-01-22T12:00Z.

    So, the updated coordinator xml will be something like this:

    <coordinator-app name="${jobName}" frequency="${coord:days(1)}" start="${startTime}" end="${endTime}" timezone="${timezone}"
        xmlns="uri:oozie:coordinator:0.2">
        <controls>
            <timeout>-1</timeout>
            <concurrency>30</concurrency>
            <execution>FIFO</execution>
        </controls>
        <datasets>
            <dataset name="myData" frequency="${coord:days(1)}" initial-instance="${startTime}" timezone="UTC">
                <uri-template>/time-independent/path/that/may/or/maynot/be/there</uri-template>
            </dataset>
        </datasets>
        <input-events>
            <data-in name="myInput" dataset="myData">
                <instance>${startTime}</instance>
            </data-in>
        </input-events>
        <action>
            <workflow>
                <app-path>${myAppPath}</app-path>
                <configuration>
                    <property>
                        <name>myInput</name>
                        <value>${coord:dataIn('myInput')}</value>
                    </property>
                </configuration>
            </workflow>
        </action>
    </coordinator-app>