scalahadoopscalding

On which hadoop node would the below scalding pre-process and post-process runs?


I have the below example code for some preprocess before sclading job runs and some post-process. As these pre-process and post-process are calling some mysql database I would like to know on which hadoop nodes would hadoop potentially run them? (I need to open the port from these nodes to database) could it run the pre-process and post-process any hadoop data-node? I tried doing some research but could not find any indication, how is it possible to find by documentation / sources on which node it would be run? (PS the jobs are scheduled with oozie)

  preProcessingBeforeJobRuns() // **in which hadoop node would this be run? could it run on any datanode?**
  log.info(s"ABOUT TO RUN JOB with input $jobInput")
  val scaldingTool = new Tool
  scaldingTool.setJobConstructor(createJob(jobInput))
  val parser: GenericOptionsParser = new GenericOptionsParser(new Configuration(), args)
  scaldingTool.setConf(parser.getConfiguration)
  log.info(s"CALLING SCALDING RUN with args: ${args.toList.mkString(" ")}")
  val status = scaldingTool.run(args)
  log.info("FINISHED RUNNING JOB!")
  somePostJobProcessing() // **in which hadoop node would this be run? could it run on any datanode?**

Solution

  • The code you've posted will run on the Hadoop master node. scaldingTool.run(args) will trigger your job, which would trigger the jobs that execute on task nodes.