hivehue

<Issue> Float data type in Hive


Init data:

CREATE TABLE `test.test_data`(
    user VARCHAR(10),
    amount FLOAT) 
TBLPROPERTIES ('transactional'='true');
INSERT INTO `test.test_data`
SELECT 'sonnh', 300000000000
UNION ALL
SELECT 'sonnh1', 1000000000000
UNION ALL
SELECT 'sonnh2', 900000000000;

Get data:

SELECT * FROM `test.test_data`

Results:

test.test_data.user    test.test_data.amount
sonnh                  299999986000
sonnh1                 1000000000000
sonnh2                 899999990000

I have a problem when use float data type like above. Someone explain it to me.


Solution

  • i think this goes back to basics of how hive store float, double, and decimal.

    So it is always advisable to use decimal in case you need financial applications and use float or double when you dont care about rounding issues.

    Here is an experimentation with above three data types.

    CREATE TABLE tmp(   a string,     f FLOAT,    db double,    dc decimal(5,4)    )  ;
        insert into tmp SELECT 'id', 900000000000000000,900000000000000000,123.123 ;
        insert into tmp SELECT 'id', 90000000,99900000000000000000,99.999 ;
        insert into tmp SELECT 'id', 99123.456,9000000123.456,9.9999999 ;
        insert into tmp SELECT 'id', 9999999123.456,500000,123.123 ;
        insert into tmp SELECT 'id', 9999999000.456,99900000000000000000999999999999999.12349,9990.099999 ;
        insert into tmp SELECT 'id', 9999999000.456,9990000000000000000099999999999.12349,90.999 ;
        
        select * from tmp;
    
    tmp.a   tmp.f               tmp.db                      tmp.dc  
    id      900000020000000000  900000000000000000          123.12300   -- notice that float column cant handle the data and showing different data.
    id      90000000            99900000000000000000        99.99900    -- everything is fine because they are within range.
    id      99123.45            9000000123.456              10.00000    -- decimal approximated the data to 10 which is very close to actual value.
    id      9999999000          500000                      123.12300   --all good
    id      9999999000          9.99e+34                    NULL        --Decimal put null because it cant handle the data. Double approximated the data to E34. 
    id      9999999000          9.99e+30                    90.99900    --Decimal is good because data is within range. Double approximated the data to E34.