hivehiveqlhive-query

Create range bins in hive for histograms


I have a data set which contains students_id and their ages. I want the marks should be arranged in a range or bin with the bucket size of 10.

stud_id    ages
101        11
102        13
103        21
104        25

Similarly i have date for more number of records. this has to be arranged with a bin size of 10.

The Expected output is:

stud_id     ages_bin
101         11-20
102         11-20
103         21-30
104         21-30

I tried simple case statement in hive.

select stud_id,
case when ages between 0 and 10 then '0-10'
when ages between 11 and 20 then '11-20'
when ages between 21 and 30 then '21-30'
when ages between 31 and 40 then '31-40'
when ages between 41 and 50 then '41-50'
when ages between 51 and 60 then '51-60'
when ages between 61 and 70 then '61-70'
when ages between 71 and 80 then '71-80'
when ages between 81 and 90 then '81-90'
when ages between 91 and 100 then '91-100'
when ages between 101 and 110 then '101-110'
when ages between 111 and 120 then '111-120'
when ages between 121 and 130 then '121-130'
when ages between 131 and 140 then '131-140'
when ages between 141 and 150 then '141-150'
else NULL end as ages_bin
from students

Is there any simple way to have the binned data with bucket size 10?

can someone help me in writing a simple code?


Solution

  • There's one simple method to arrange the range of bins for histogram. Here is the code:

    select stud_id,floor((ages)/10)*10 as strt_range,
    floor((ages)/10)*10+9 as end_range from students
    

    This produces the following output:

    stud_id     ages_bin
    101         10-19
    102         10-19
    103         20-29
    104         20-29