sqlpysparkdatabrickstimedate

How to find duration in seconds from an interval day to second object in Databricks


In a Databricks SQL query, I am creating a column to get the difference in Timestamps of successive rows. I use something like the following,

select
  (Timestamp - (LAG(Timestamp, 1) OVER (partition by colA order by Timestamp))) tdiff,
  * 
from 
  the_table

But this gives me value of type interval day to second like {"seconds": 180, "nano": 0, "negative": false, "zero": false, "units": ["SECONDS", "NANOS"]}

How can I get it converted to seconds. For this example, I should basically see 180.


Solution

  • Looks like in earlier runtimes (tested at least for 13.2) datediff supports the seconds unit.

    https://docs.databricks.com/en/sql/language-manual/functions/datediff3.html

    This just generates some random ids with multiple values and a fuzzed time to show the syntax.

    Also shows equivalent extract from interval as originally posed.

    %sql
    with some_ids_and_random_ints as (
      select
        id,
        floor(rand(5) * 1000) random_int
      from
        range(1000) AS t
    ),
    force_dups_on_id as (
      select
        mod(id, 10) as id_with_dups,
        timestampadd(second, random_int, current_timestamp()) as some_time
      from
        some_ids_and_random_ints
    ),
    result_compare as (
    select
      *,
      datediff(
        second,
        lag(some_time) over (
          partition by id_with_dups
          order by
            some_time
        ),
        some_time
      ) as diff_inbuilt_function,
      (some_time - (LAG(some_time, 1) OVER (partition by id_with_dups order by some_time))) tdiff_as_interval
    from
      force_dups_on_id
    order by
      id_with_dups,
      some_time
    )
    select id_with_dups,
           some_time,
           diff_inbuilt_function,
           cast(extract(second from tdiff_as_interval) as integer) as seconds_from_interval
      from result_compare