apache-kafkaoozieconfluent-platformoozie-coordinatoroozie-workflow

Error while executing shell-script using oozie


I'm trying to run kafka-connect-hdfs using Oozie version: 4.2.0.2.6.5.0-292 via script file sample.sh.
Yes I do know we can run the kafka-hdfs connector directly, but it should happen via oozie.
Kafka has a topic sample and has some data in it.
Trying to push that data to hdfs via oozie.
I have referred a lot of resources before coming here but now luck.

ERROR

Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
2018-07-25 09:54:16,945  INFO ActionEndXCommand:520 - SERVER[nnuat.iot.com] USER[root] GROUP[-] TOKEN[] APP[sample] JOB[0000000-180725094930282-oozie-oozi-W] ACTION[0000000-180725094930282-oozie-oozi-W@shell1] ERROR is considered as FAILED for SLA

I have all the three files inside hdfs and gave permissions to all the files (sample.sh, job.properties, workflow.xml) having all the files inside the location /user/root/sample in hdfs.

Note : Running the oozie in cluster so all the three nodes have the same path and files in it as namenode(/root/oozie-demo) and confluent-kafka(/opt/confluent-4..1.1) too.

job.properties

nameNode=hdfs://171.18.1.192:8020
jobTracker=171.18.1.192:8050
queueName=default
oozie.libpath=${nameNode}/user/oozie/share/lib/lib_20180703063118
oozie.wf.rerun.failnodes=true
oozie.use.system.libpath=true
oozieProjectRoot=${nameNode}/user/${user.name}
oozie.wf.application.path=${nameNode}/user/${user.name}/sample

workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.3" name="sample">
<start to="shell1"/>
<action name="shell1">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
             <property>
                   <name>hadoop.proxyuser.oozie.hosts</name>
                  <value>*</value>
             </property>
             <property>
                   <name>hadoop.proxyuser.oozie.groups</name>
                   <value>*</value>
            </property>
            <property>
                    <name>oozie.launcher.mapreduce.map.java.opts</name>
                   <value>-verbose</value>
            </property>
        </configuration>
    <!--<exec>${myscript}</exec>-->
        <exec>smaple.sh</exec>
         <env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
        <file>hdfs://171.18.1.192:8020/user/root/sample/smaple.sh</file>
         <capture-output/>
    </shell>
    <ok to="end"/>
    <error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<kill name="fail-output">
    <message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shellaction')['my_output']}]</message>
</kill>
<end name="end"/>
</workflow-app>

sample.sh #!/bin/bash

 sudo /opt/confluent-4.1.1/bin/connect-standalone /opt/confluent-4.1.1/etc/schema-registry/connect-avro-standalone.properties /opt/confluent-4.1.1/etc/kafka-connect-hdfs/IOT_DEMO-hdfs.properties

I could not able to find the cause of the Error, I have also tried putting all the jars inside confluent-kafka to oozie/lib directory in hdfs.

link for yarn and oozie error logs.yarn-oozie-error-logs

Thanks!


Solution

  • Kafka Connect is meant to be entirely ran Standalone process, not scheduled via Oozie.

    It never dies except in the event of error and if Oozie relaunches a failed task, you're almost guaranteed to get duplicated data on HDFS because the Connect offsets are not persistently stored anywhere except for local disk (assume Connect restarts on a separate machine) so I don't see the point of this.

    You should instead be independently running connect-distributed.sh as a system service on a dedicated set of machines, then you POST the config JSON to the Connect HTTP endpoint. Then, tasks will get distributed as part of the Connect framework and offsets are stored persistently back into a Kafka topic for fault tolerance


    If you absolutely want to use Oozie, Confluent includes the Camus tool, which is deprecated in favor of Connect, but I've been maintaining a Camus+Oozie process for a while, and it works quite well, it's just hard to monitor for failure once lots of topics are added. Apache Gobbilin is the second iteration of that project, not maintained by Confluent

    It also appears you're running HDP, so Apache Nifi should be able to be installed on your cluster as well for handling Kafka & HDFS related tasks