shellhadoopclouderaoozie-workflow

Hadoop command to run bash script in hadoop cluster


I've a shell script (count.sh) which counts the number of lines in a file. This script has been copied into hdfs and am currently using Oozie workflow to execute this script.

However, I was wondering if there is a way to execute this shell script from command line.

Ex:

In unix: [myuser@myserver ~]$./count.sh

Equivalent of this when count.sh is in hadoop cluster location '/user/cloudera/myscripts/count.sh'.

I read this Hadoop command to run bash script in hadoop cluster, but am still unclear.


Solution

  • What you're looking for is called Hadoop streaming.

    You can look at the official documentation Hadoop Streaming to find out more or look at Writing An Hadoop MapReduce Program In Python (instead of python, put in your bash script) to understand how to use it.