I'm trying to run kafka-connect-hdfs
using Oozie version: 4.2.0.2.6.5.0-292
via script file sample.sh
.
Yes I do know we can run the kafka-hdfs connector directly, but it should happen via oozie.
Kafka has a topic sample
and has some data in it.
Trying to push that data to hdfs via oozie.
I have referred a lot of resources before coming here but now luck.
ERROR
Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
2018-07-25 09:54:16,945 INFO ActionEndXCommand:520 - SERVER[nnuat.iot.com] USER[root] GROUP[-] TOKEN[] APP[sample] JOB[0000000-180725094930282-oozie-oozi-W] ACTION[0000000-180725094930282-oozie-oozi-W@shell1] ERROR is considered as FAILED for SLA
I have all the three files inside hdfs and gave permissions to all the files (sample.sh, job.properties, workflow.xml)
having all the files inside the location /user/root/sample
in hdfs.
Note : Running the oozie in cluster so all the three nodes have the same path and files in it as namenode(/root/oozie-demo) and confluent-kafka(/opt/confluent-4..1.1) too.
job.properties
nameNode=hdfs://171.18.1.192:8020
jobTracker=171.18.1.192:8050
queueName=default
oozie.libpath=${nameNode}/user/oozie/share/lib/lib_20180703063118
oozie.wf.rerun.failnodes=true
oozie.use.system.libpath=true
oozieProjectRoot=${nameNode}/user/${user.name}
oozie.wf.application.path=${nameNode}/user/${user.name}/sample
workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.3" name="sample">
<start to="shell1"/>
<action name="shell1">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
<property>
<name>oozie.launcher.mapreduce.map.java.opts</name>
<value>-verbose</value>
</property>
</configuration>
<!--<exec>${myscript}</exec>-->
<exec>smaple.sh</exec>
<env-var>HADOOP_USER_NAME=${wf:user()}</env-var>
<file>hdfs://171.18.1.192:8020/user/root/sample/smaple.sh</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<kill name="fail-output">
<message>Incorrect output, expected [Hello Oozie] but was [${wf:actionData('shellaction')['my_output']}]</message>
</kill>
<end name="end"/>
</workflow-app>
sample.sh #!/bin/bash
sudo /opt/confluent-4.1.1/bin/connect-standalone /opt/confluent-4.1.1/etc/schema-registry/connect-avro-standalone.properties /opt/confluent-4.1.1/etc/kafka-connect-hdfs/IOT_DEMO-hdfs.properties
I could not able to find the cause of the Error, I have also tried putting all the jars inside confluent-kafka to oozie/lib directory in hdfs.
link for yarn and oozie error logs.yarn-oozie-error-logs
Thanks!
Kafka Connect is meant to be entirely ran Standalone process, not scheduled via Oozie.
It never dies except in the event of error and if Oozie relaunches a failed task, you're almost guaranteed to get duplicated data on HDFS because the Connect offsets are not persistently stored anywhere except for local disk (assume Connect restarts on a separate machine) so I don't see the point of this.
You should instead be independently running connect-distributed.sh
as a system service on a dedicated set of machines, then you POST the config JSON to the Connect HTTP endpoint. Then, tasks will get distributed as part of the Connect framework and offsets are stored persistently back into a Kafka topic for fault tolerance
If you absolutely want to use Oozie, Confluent includes the Camus tool, which is deprecated in favor of Connect, but I've been maintaining a Camus+Oozie process for a while, and it works quite well, it's just hard to monitor for failure once lots of topics are added. Apache Gobbilin is the second iteration of that project, not maintained by Confluent
It also appears you're running HDP, so Apache Nifi should be able to be installed on your cluster as well for handling Kafka & HDFS related tasks