sqltimestampsnowflake-cloud-data-platformdistinctansi-sql

How to select the first timestamp that appears in a distinct query in SQL?


So I have data as follows:

USER                TIMESTAMP    DATA
0001  2021-05-18 20:40:06.251   alpha
0002  2021-05-18 22:40:06.251    beta
0002  2021-05-18 21:40:06.251   gamma
0003  2021-05-18 19:40:06.251   delta
0004  2021-05-18 01:40:06.251 epsilon
0003  2021-05-18 20:40:06.251    zeta

And I want to create a list of all DISTINCT USER names as well as their associated DATA of the FIRST TIMESTAMP.

The result should look like this:

USER                TIMESTAMP    DATA
0001  2021-05-18 20:40:06.251   alpha
0002  2021-05-18 21:40:06.251   gamma
0003  2021-05-18 19:40:06.251   delta
0004  2021-05-18 01:40:06.251 epsilon

Now this question is resolved by doing the following:

    SELECT *
    FROM df a
    WHERE a.TIMESTAMP IN (
            SELECT min(TIMESTAMP)
            FROM df AS b
            WHERE a.USER = b.USER
        )

BUT, this is computationally taxing on my server so I need a different approach. I have been trying joining the df on itself ON USER AND TIMESTAMP but I haven't gotten it yet, any idea for a faster method? I am doing this in Snowflake btw...


Solution

  • You might try window functions;

    select df.*
    from (select df.*,
                 row_number() over (partition by user order by timestamp) as seqnum
          from df
         ) df
    where seqnum = 1;