I have a dataset that has different submissionid's and submissiondatetimes associated with invoicenumbers. I am trying to select just the row(s) with the most recent submissiondatetime for each invoicenumber. Sample data is below, the desired outcome is to only return line 2
Sample queries I have tried are as follows:
SELECT TOP 1000
S.SubissionID,
BookingID,
InvoiceNumber,
InvoiceDate,
NumberofServices,
SubmissionDateTime
FROM
[dbo].[Submission] S
JOIN
[dbo].[SubmissionInvoice] SI ON S.SubmissionID = SI.SubmissionID
INNER JOIN
(SELECT MAX(SubmissionDateTime maxdate
FROM [dbo].[Submission]) m ON m.maxdate = s.SubmissionDateTime
This only returns records with the most recent SubmissionDateTime regardless of invoicenumber or submissionid.
SELECT TOP 1000
S.SubissionID,
BookingID,
InvoiceNumber,
InvoiceDate,
NumberofServices,
SubmissionDateTime
FROM
[dbo].[Submission] S
JOIN
[dbo].[SubmissionInvoice] SI ON S.SubmissionID = SI.SubmissionID
AND SubmissionDateTime = (SELECT MAX (SubmissionDateTime)
FROM [dbo].[Submission] AS B
WHERE S.SubmissionID = B.SubmissionID)
This max date filtering didn't seem to do anything, results look like they are the same as if I didn't include that part of the query at all.
You can use window functions like row_number() if your RDBMs supports it. Here is an example
SELECT SubissionID
,BookingID
,InvoiceNumber
,InvoiceDate
,NumberofServices
,SubmissionDateTime
FROM (
SELECT S.SubissionID
,BookingID
,InvoiceNumber
,InvoiceDate
,NumberofServices
,SubmissionDateTime
,row_number() over(partition by InvoiceNumber order by SubmissionDateTime DESC) AS RN
FROM [dbo].[Submission] S
JOIN [dbo].[SubmissionInvoice] SI
on S.SubmissionID = SI.SubmissionID) t
WHERE RN = 1