hadoopmapreducepartitioner

Why is the Partitioner invoked even with a single reducer


If we have a MR Job configured to run only with a single reducer it seems logical that a Partitioner need not be invoked.

However i just gave this a shot and it looks like the Partitioner is invoked even if the job is configured with a single reducer.

Any ideas why this would be required ?


Solution

  • It's because the assignment of a key/value pair to a particular reducer is the responsibility of a class playing the role of partitioner. Even if there is only one reducer you still need a partitioner to assign the key/value pairs to that one reducer.

    The presence of any default values or if-there's-only-one-reducer logic effectively distributes the partition assignment behavior to places outside of the partitioner which isn't really good OO design.