I understand that there are properties like CRUNCH_BYTES_PER_REDUCE_TASK or mapred.reduce.tasks to set number of reducers.
Can anyone suggest on configuring / overriding the default reducers for a particular Dofn which is taking more time to execute.
Reducers can be configured for particular DoFn by using the ParallelDoOptions
and passing this as a 4th argument in parallelDo
like this:
ParallelDoOptions opts = ParallelDoOptions.builder().conf("mapred.reduce.tasks", "64").build();
and pass this in parallelDo
as 4th parameter.