apache-spark-sqldatabricks

extracting HOUR from an interval in spark sql


I was wondering how to properly extract amount of hours between given 2 timestamps objects.

For instance, when the following SQL query gets executed:

    select x, extract(HOUR FROM x) as result
    from
    (select (TIMESTAMP'2021-01-22T05:00:00' - TIMESTAMP'2021-01-01T09:00:00') as x)

The result value is 20, while I'd expect it to be 500.

It seems odd to me considering that x value indicates the expected return value.

Can anyone please explain to me what I'm doing wrong and perhaps suggest additional way of query so the desired result would return?

Thanks in advance!


Solution

  • I think you have to do the maths with this one as datediff in SparkSQL only supports days. This worked for me:

    SELECT (unix_timestamp(to_timestamp('2021-01-22T05:00:00') ) - unix_timestamp(to_timestamp('2021-01-01T09:00:00'))) / 60 / 60 diffInHours
    

    My results (in Synapse Notebook, not Databricks but I expect it to be the same):

    enter image description here

    The unix_timestamp function converts the timestamp to a Unix timestamp (in seconds) and then you can apply date math to it. Subtracting them gives the number of seconds between the two timestamps. Divide by 60 for the number minutes between the two dates and by 60 again for the number of hours between the two dates.