apachehadoophivepartitioningbuckets

Why number of buckets in hive should be equal to number of reducers?


In hive, why number of buckets should be equal to number of reducers?


Solution

  • Because this is the most optimized way of working for mapreduce (all else equal). Tasks will be divided among reducers.

    In hive 0.x and 1.x you have to specify the following: hive.enforce.bucketing = true. This means that the number of reducers will be automatically determined based on the number of buckets in your table. In later versions of hive (2.x) this is set by default.

    Source: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables