hadoophadoop-yarnooziecascadingscalding

How to find the exact hadoop jar command which was running my job?


I'm using CDH5.4. I'm running a hadoop job which from command line appears to be ok (when simply running with hadoop jar). However if I run it from yarn It finishes silently with a single mapper and no reducers. I really suspect both 'runs' were running the same exact command. However, I want to be sure of that. So I look at the logs at:

(note its a scalding job with custom runner - all is fine when I run it from command line).

/container_1432733015407_0953_01_000001/container_1432733015407_0953_01_000001/user/stdout/?start=0

and I saw something like:

Main class        : org.apache.oozie.action.hadoop.JavaMain

Maximum output    : 2048

Arguments         :
                    -D
                    oneparam=value
                    -D
                    secondparam=value

so i took these and turned into a command line.

and ran it with something like

hadoop jar MyScaldingRunner -D oneparam=value -D secondparam=value and it ran just fine and produced the results.

Is there a way for me to view the SAME EXACT hadoop jar command line that the hadoop was running when it was executed via oozie + yarn to run it? because from over there it just finishes silently!


Solution

  • I don't have direct answer to your question but JDiagnostics could help you to recreate the parameters needed, like classpath or environment variables. Here is an example you can put in the beginning of your program before you run it:

      LOG.info(new DefaultQuery().call())