sqlsql-servert-sqlgreatest-n-per-group

Get top 1 row of each group


I have a table which I want to get the latest entry for each group. Here's the table:

DocumentStatusLogs Table

ID DocumentID Status DateCreated
2 1 S1 7/29/2011
3 1 S2 7/30/2011
6 1 S1 8/02/2011
1 2 S1 7/28/2011
4 2 S2 7/30/2011
5 2 S3 8/01/2011
6 3 S1 8/02/2011

The table will be grouped by DocumentID and sorted by DateCreated in descending order. For each DocumentID, I want to get the latest status.

My preferred output:

DocumentID Status DateCreated
1 S1 8/02/2011
2 S3 8/01/2011
3 S1 8/02/2011

Please see the parent table for more information:

Current Documents Table

DocumentID Title Content DateCreated
1 TitleA ... ...
2 TitleB ... ...
3 TitleC ... ...

Should the parent table be like this so that I can easily access its status?

DocumentID Title Content DateCreated CurrentStatus
1 TitleA ... ... s1
2 TitleB ... ... s3
3 TitleC ... ... s1

UPDATE I just learned how to use "apply" which makes it easier to address such problems.


Solution

  • WITH cte AS
    (
       SELECT *,
             ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
       FROM DocumentStatusLogs
    )
    SELECT *
    FROM cte
    WHERE rn = 1
    

    If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead of ROW_NUMBER.

    As for normalised or not, it depends if you want to:

    As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.