hadoopmapreduceapache-crunch

Configuring number of reducers for a particular Dofn in Apache crunch


I understand that there are properties like CRUNCH_BYTES_PER_REDUCE_TASK or mapred.reduce.tasks to set number of reducers.

Can anyone suggest on configuring / overriding the default reducers for a particular Dofn which is taking more time to execute.


Solution

  • Reducers can be configured for particular DoFn by using the ParallelDoOptions and passing this as a 4th argument in parallelDo like this:

    ParallelDoOptions opts = ParallelDoOptions.builder().conf("mapred.reduce.tasks", "64").build();
    

    and pass this in parallelDo as 4th parameter.