amazon-redshiftpartition-by

How to query only 1 ID out of multiple rows and choose based on the maximum date with amazon redshift?


I have this data structure in my table. One row represents 1 date and 1 status

id    date             status
1     2022-02-12         in
1     2022-02-13         out
1     2022-02-14         dlv
2     2022-02-13         in
2     2022-02-14         out
2     2022-02-15         dlv

From this table, I want to get the maximum date in which before or equal to 2022-02-14 and also the status for each row. The desired result:

id    date             status
1     2022-02-14         dlv
2     2022-02-14         out

What I have done is:

select distinct id, date, status
from table1
where date <= ('2022-02-14')
group by date, status
order by date desc limit 1

But this query only gives one id out of 100 id I have in my table.

Can somebody help me how to get the desired table?

I use aws redshift for this.

Thank you in advance


Solution

  • You need to use ROW_NUMBER() OVER for this type of query. Order the rows by date descending and find the row with the lowest number for each id. Like so:

    with row_order as (
      select id, date, status, row_number() over(partition by id order by date 
      descending) as rn
      from table1
      where date <= ('2022-02-14')
    )
    select id, date, status
    from row_order
    where rn = 1;