hadoopmapreduceapache-crunch

How to convert existing MapReduce applications to Crunch?


I have several (about a dozen) MapReduce tasks implemented, each of which functions as part of a workflow executed by a simple bash script. For a variety of reasons, I would like to move the workflow to Apache Crunch.

However, it's not clear to me how to run my MapReduce tasks as Crunch functions without re-implementing them. Is there a straightforward way to use Map and Reduce implementations as Crunch functions? I would like to maintain the Tool implementations as well so the MapReduce tasks can be run both standalone and as part of the Crunch workflow; is there any way to do this?

Thanks for any insight.


Solution

  • For any who might stumble across this, there is a minimally documented API in the Crunch libs. However, it is fairly straightforward.

    See here: https://crunch.apache.org/apidocs/0.10.0/org/apache/crunch/lib/Mapreduce.html