shellhadoopooziespark-submitoozie-workflow

Spark-submit is not working which is there in shell script submitted thru Oozie workflow


Thru Oozie Workflow i have submitted a shell script which has spark-submit command in it.

I uploaded the shell script in hdfs /user/admin/first.sh thru oozie console. When i am running the script till spark-submit command it is running fine. When it is trying to run the spark-submit cmd it is failing. Reason spark-submit is there in local file system but my script is running in hadoop file system with hadoop admin user. Any solution to overcome this. How to run local file system (spark-submit) command from hadoop user in hadoop file system or Can i copy the script from hadoop file system to local file system with help of Oozi.

/usr/lib/spark/bin/spark-submit --driver-java-options "-Dcurrent.job.id=$1 -Dexecutive.transform.dumpname=$dump_name -Dexecutive.transform.source=$SOURCE -Dexecutive.transform.jobid=$1 -Dexecutive.transform.run=$run_id -Dlogging.job.type=$JOB_TYPE -Dlogging.module.name=$MODULE_NAME" --conf spark.executor.extraJavaOptions="-Dcurrent.job.id=$1 -Dexecutive.transform.dumpname=$dump_name -Dexecutive.transform.source=$SOURCE -Dexecutive.transform.jobid=$1 -Dexecutive.transform.run=$run_id -Dlogging.job.type=$JOB_TYPE -Dlogging.module.name=$MODULE_NAME"  --master yarn-cluster --deploy-mode cluster --conf spark.yarn.user.classpath.first=true --class com.insideview.transform.ExecutiveTransformerSparkPipelineJob --jars $5/deploy/etl/dp-properties/DPProperties-$dp_version.jar,$5/deploy/etl/contact-transform/jars/ExecNameTransformer-$dp_version.jar,$5/deploy/hbase/lib/hbase-client.jar,$5/deploy/hbase/lib/hbase-common.jar,$5/deploy/hbase/lib/hbase-server.jar,$5/deploy/hbase/lib/protobuf-java-2.5.0.jar,$5/deploy/hbase/lib/hbase-protocol.jar,$5/deploy/hbase/lib/htrace-core-3.1.0-incubating.jar,$5/deploy/etl/contact-transform/jars/
./first.sh: line 64: /usr/lib/spark/bin/spark-submit: No such file or directory

I have few db steps such as select statements which are running fine before spark-submit. When it is coming to spark-submt step it is not able to run due to local file system.


Solution

  • The reason it's failing is that a shell action will run on an arbitrary node in your cluster, and those nodes do not have spark-submit installed on them (hence the "No such file or directory" error).

    You have two options:

    1. continue using a shell script and install spark-submit on all the data nodes in your cluster.
    2. Use an Oozie Spark action