hivehiveqlhive-partitionshiveddl

How to partition a Hive Table using range of values for a column


I have a Hive Table with 2 columns.Employee ID and Salary.

Data is something like given below.

Employee ID Salary
1   10000.08
2   20078.67
3   20056.45
4   30000.76
5   10045.14
6   43567.76

I want to create Partitions based on Salary Column.For Example Partition for salary range 10000 to 20000, 20001 to 30000.

How do i achieve this.


Solution

  • Hive does not support range partitioning, but you can calculate ranges during data load.

    1. Create table partitioned by salary_range:

      create table your_table
      (
       employee_id bigint,
       salary double
      )
      partitioned by (salary_range bigint)
      
    2. insert using case for salary range calculation:

      insert overwrite table your_table partition (salary_range)   
      select employee_id, salary,  
             case 
                 when salary between 10000 and 20000 then 20000
                 when salary between 20001 and 30000 then 30000 
                 ...
                 else ...
             end as salary_range 
      from some_table;