So I have data as follows:
USER TIMESTAMP DATA
0001 2021-05-18 20:40:06.251 alpha
0002 2021-05-18 22:40:06.251 beta
0002 2021-05-18 21:40:06.251 gamma
0003 2021-05-18 19:40:06.251 delta
0004 2021-05-18 01:40:06.251 epsilon
0003 2021-05-18 20:40:06.251 zeta
And I want to create a list of all DISTINCT USER names as well as their associated DATA of the FIRST TIMESTAMP.
The result should look like this:
USER TIMESTAMP DATA
0001 2021-05-18 20:40:06.251 alpha
0002 2021-05-18 21:40:06.251 gamma
0003 2021-05-18 19:40:06.251 delta
0004 2021-05-18 01:40:06.251 epsilon
Now this question is resolved by doing the following:
SELECT *
FROM df a
WHERE a.TIMESTAMP IN (
SELECT min(TIMESTAMP)
FROM df AS b
WHERE a.USER = b.USER
)
BUT, this is computationally taxing on my server so I need a different approach. I have been trying joining the df on itself ON USER AND TIMESTAMP but I haven't gotten it yet, any idea for a faster method? I am doing this in Snowflake btw...
You might try window functions;
select df.*
from (select df.*,
row_number() over (partition by user order by timestamp) as seqnum
from df
) df
where seqnum = 1;