datehadooptimestamptype-conversionimpala

Hadoop Impala: Format datatype integer to date/timestamp to use addtime function


I am working in Impala with the following table:

customer_id | day_id    | return_day_id
ABC           20170830    20170923
BCD           20170830    20170901

Unfortunately, both the day_id & return_day_id fields are INT and not date.

How can I change their datatype to be date, so that I can count distinct customer_ids only with return_day_id within 4 days after day_id. Would I need to cast it to date and then to timestamp so that I can use the adddate function?


Solution

  • What one of the comments pointed out correctly, is that you need to use unix_timestamp and from_unixtime, however you don't need to use cast. Many other questions on stackoverflow encountered similar issues, each attempting to use cast, but that in many cases ended up returning NULL values or not working.

    The below should give you the time object you need, in the format of YYYY-MM-dd and you can optionally also add the hh-mm-ss to it.

    SELECT TBL.day_id, from_unixtime(unix_timestamp(cast (TBL.day_id as string), "yyyyMMdd"))
    
    FROM yourTable as TBL
    
    LIMIT 10 
    

    One small consideration to keep in mind is that this converts a string into the date, so we cast it as string at cast (TBL.day_id as string), "yyyyMMdd"))

    Example:

    20160220 -> 2016-02-20 00:00:00

    You can find more info in the documentation here.

    Hope that helps, Julian