hadoopmapreduceamazon-emroozieapache-crunch

Apache Crunch Job On AWS EMR using Oozie


Context:

Issue:

Things I've seen:

Exemple of oozie pipeline:


Solution

  • The issue was with the fs.defaultFS hadoop property. We were using viewfs and the output paths that were given to apache crunch were prefixed with viewfs:// . Because of this it was not able to write to HDFS. So we set the defaultFS to hdfs:// for the writing phase. The reading is from s3 bucket which is mounted as /folder_name on hdfs. For the reading phase the files had to be prefixed with viewfs://.