Context:
Issue:
Things I've seen:
Exemple of oozie pipeline:
Java_Action_1 (which points to a java class that is being run)
Java_Action_2 (which points to a java class that is being run)
Java_Action_3 (which points to a java class that is being run)
Subworkflow_1 (has a fork and join step, seen it in the Oozie UI)
Java_Action_1_in_subworkflow (which points to a java class that is being run) -> job that is not writing to HDFS
Java_Action_1_in_subworkflow (which points to a java class that is being run)
Java_Action_4 (which points to a java class that is being run)
Java_Action_5 (which points to a java class that is being run)
etc.
The issue was with the fs.defaultFS hadoop property. We were using viewfs and the output paths that were given to apache crunch were prefixed with viewfs:// . Because of this it was not able to write to HDFS. So we set the defaultFS to hdfs:// for the writing phase. The reading is from s3 bucket which is mounted as /folder_name on hdfs. For the reading phase the files had to be prefixed with viewfs://.