I am working in Impala with the following table:
customer_id | day_id | return_day_id
ABC 20170830 20170923
BCD 20170830 20170901
Unfortunately, both the day_id & return_day_id fields are INT and not date.
How can I change their datatype to be date, so that I can count distinct customer_ids only with return_day_id within 4 days after day_id. Would I need to cast it to date and then to timestamp so that I can use the adddate function?
What one of the comments pointed out correctly, is that you need to use unix_timestamp
and from_unixtime
, however you don't need to use cast. Many other questions on stackoverflow encountered similar issues, each attempting to use cast, but that in many cases ended up returning NULL values or not working.
The below should give you the time object you need, in the format of YYYY-MM-dd and you can optionally also add the hh-mm-ss to it.
SELECT TBL.day_id, from_unixtime(unix_timestamp(cast (TBL.day_id as string), "yyyyMMdd"))
FROM yourTable as TBL
LIMIT 10
One small consideration to keep in mind is that this converts a string into the date, so we cast it as string at cast (TBL.day_id as string), "yyyyMMdd"))
Example:
20160220 -> 2016-02-20 00:00:00
You can find more info in the documentation here.
Hope that helps, Julian