I have a table which I want to get the latest entry for each group. Here's the table:
DocumentStatusLogs
Table
ID | DocumentID | Status | DateCreated |
---|---|---|---|
2 | 1 | S1 | 7/29/2011 |
3 | 1 | S2 | 7/30/2011 |
6 | 1 | S1 | 8/02/2011 |
1 | 2 | S1 | 7/28/2011 |
4 | 2 | S2 | 7/30/2011 |
5 | 2 | S3 | 8/01/2011 |
6 | 3 | S1 | 8/02/2011 |
The table will be grouped by DocumentID
and sorted by DateCreated
in descending order. For each DocumentID
, I want to get the latest status.
My preferred output:
DocumentID | Status | DateCreated |
---|---|---|
1 | S1 | 8/02/2011 |
2 | S3 | 8/01/2011 |
3 | S1 | 8/02/2011 |
Is there any aggregate function to get only the top from each group? See pseudo-code GetOnlyTheTop
below:
SELECT
DocumentID,
GetOnlyTheTop(Status),
GetOnlyTheTop(DateCreated)
FROM DocumentStatusLogs
GROUP BY DocumentID
ORDER BY DateCreated DESC
If such function doesn't exist, is there any way I can achieve the output I want?
Or at the first place, could this be caused by unnormalized database? I'm thinking, since what I'm looking for is just one row, should that status
also be located in the parent table?
Please see the parent table for more information:
Current Documents
Table
DocumentID | Title | Content | DateCreated |
---|---|---|---|
1 | TitleA | ... | ... |
2 | TitleB | ... | ... |
3 | TitleC | ... | ... |
Should the parent table be like this so that I can easily access its status?
DocumentID | Title | Content | DateCreated | CurrentStatus |
---|---|---|---|---|
1 | TitleA | ... | ... | s1 |
2 | TitleB | ... | ... | s3 |
3 | TitleC | ... | ... | s1 |
UPDATE I just learned how to use "apply" which makes it easier to address such problems.
WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1
If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead of ROW_NUMBER.
As for normalised or not, it depends if you want to:
As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.