I am trying to get an output using SQL Server 2016 that will return StudyID
's that only have an AssignmentType
of "Final" i.e. no AssignmentType
of "Draft".
Some studies may have a "Draft" followed by a "Final". I do not want these. I purely want studyID
's that have 1 AssignmentType
which is "Final".
I currently get this data being returned:
StudyID | ClientName | DateSign | SchedStart | ActStart | ActEnd | SchedEnd | AssignmentType |
---|---|---|---|---|---|---|---|
1 | Client1 | NULL | 2022-01-05 | NULL | NULL | NULL | Final |
2 | Client2 | NULL | 2023-04-15 | NULL | NULL | NULL | Final |
3 | Client3 | NULL | 2024-01-10 | NULL | NULL | NULL | Draft |
3 | Client3 | NULL | 2024-01-25 | NULL | NULL | NULL | Final |
4 | Client4 | NULL | 2023-12-10 | NULL | NULL | NULL | Final |
5 | Client5 | NULL | 2023-12-31 | NULL | NULL | NULL | Draft |
5 | Client5 | NULL | 2024-01-20 | NULL | NULL | NULL | Final |
6 | Client6 | NULL | 2022-10-10 | NULL | NULL | NULL | Final |
7 | Client7 | NULL | 2023-09-15 | NULL | NULL | NULL | Final |
I want the output to look like this:
StudyID | ClientName | DateSign | SchedStart | ActStart | ActEnd | SchedEnd | AssignmentType |
---|---|---|---|---|---|---|---|
1 | Client1 | NULL | 2022-01-05 | NULL | NULL | NULL | Final |
2 | Client2 | NULL | 2023-04-15 | NULL | NULL | NULL | Final |
4 | Client4 | NULL | 2023-12-10 | NULL | NULL | NULL | Final |
6 | Client6 | NULL | 2022-10-10 | NULL | NULL | NULL | Final |
7 | Client7 | NULL | 2023-09-15 | NULL | NULL | NULL | Final |
My current code (below) returns about ~1500 rows when it should only end up being around 20-30 rows.
My 2 requirements are below and feel like they're just saying the same thing (I pretty much covered these above but maybe I am reading wrong/misinterpreting):
studyid
's that have a "Final" scheduled but to draft is scheduled to be completed or has been completed in the past.I was thinking I could use one of the rank functions, then pick out only the records that return a "1" for AssignmentType
but I am not quite sure how I would implement either of these in my code.
Here is my code:
SELECT DISTINCT
S.StudyNo AS StudyID,
S.Status,
S.ProtocolSign,
S.Client AS ClientName,
S.FinalRptSign as DateSign,
DT.scheduledstartdate AS SchedStart,
DT.actualstartdate AS ActStart,
DT.actualenddate AS ActEnd,
DT.scheduledenddate AS SchedEnd,
DT.[comments] AS AssignmentType
FROM
dbo.View_RptStudyModule S
JOIN
[dbo].[View_RptPhase] P ON P.studyrowid = S.StudyRowId
JOIN
[dbo].[View_Proposal] VP ON VP.[StudyRowId] = S.StudyRowId
JOIN
[dbo].[View_Costcenter] CC ON CC.[ccid]=VP.[CCId]
JOIN
(SELECT
[datetypetxt],
[PhaseRowId],
[actualstartdate],
[actualenddate],
[scheduledstartdate],
[scheduledenddate] = CASE WHEN scheduledenddate < projectedenddate THEN projectedenddate ELSE scheduledenddate END,
[projectedenddate],
[comments],
DD.Detail,
DD.DateDetailComments
FROM
[dbo].[View_Datetracked] DT
JOIN
(SELECT
[DateTrackedRowId],
[Detail],
[DateDetailComments]
FROM
[dbo].[View_RptDateDetail]) DD ON DD.DateTrackedRowId = DT.datetrackedrowid
WHERE
[scheduledstartdate] IS NOT NULL
AND [datetypetxt] = 'Dataset SAS Files'
AND [actualenddate] IS NULL AND [scheduledenddate] IS NULL) DT ON DT.PhaseRowId = P.phaserowid
WHERE
P.PhaseType = 'Dataset'
AND P.PhaseCostCenter IN ('1', '2', '3', '4', '5','6','7','8')
AND DT.[scheduledstartdate] > '2021-01-01'
AND S.FinalRptSign IS NULL
GROUP BY
S.StudyNo, S.Status, S.ProtocolSign,
S.Client, S.Duration, S.FinalRptSign,
DT.scheduledstartdate, DT.actualstartdate, DT.actualenddate, DT.scheduledenddate,
P.PhaseCostCenter,
DT.[comments]
ORDER BY
S.StudyNo ASC, DT.[comments]
I have seen other posts on here explaining the differences between Row_Number and different Rank functions and for some reason it isn't clicking in my head. Would there be any difference?
I've tried this and variations of this but it just returns my current output + "Ranking" column which seems to just literally be ranking them from 1 up until ~1500:
SELECT DISTINCT
S.StudyNo AS StudyID,
S.Status,
S.ProtocolSign,
S.Client AS ClientName,
S.FinalRptSign as DateSign,
DT.scheduledstartdate AS SchedStart,
DT.actualstartdate AS ActStart,
DT.actualenddate AS ActEnd,
DT.scheduledenddate AS SchedEnd,
DT.[comments] AS AssignmentType,
DENSE_RANK() OVER(PARTITION BY DT.[comments] Order by S.StudyNo DESC) ranking
FROM dbo.View_RptStudyModule S
Any help or clarification is greatly appreciated!
EDIT:
I've added in row_number vs dense_rank and got this as the top of my script now:
SELECT DISTINCT
S.StudyNo AS Study,
S.Status,
S.ProtocolSign,
S.Client AS ClientName,
S.FinalRptSign,
DT.scheduledstartdate AS SchedStart,
DT.actualstartdate AS ActStart,
DT.actualenddate AS ActEnd,
DT.scheduledenddate AS SchedEnd,
DT.[comments] AS AssignmentType,
ROW_NUMBER() OVER (PARTITION BY S.StudyNo ORDER BY DT.[comments] DESC) AS RN
FROM dbo.View_RptStudyModule S
My column RN now contains 1 or 2 and where 2 is present, it is a second instance of AssignmentType. Just need to figure out how to capture this and it should be it
Edit 2:
Condensed version that runs but no output
SELECT *
FROM (
SELECT DISTINCT
S.StudyNo as StudyID,
S.Client AS ClientName,
S.FinalRptSign AS DateSign,
DT.scheduledstartdate AS SchedStart,
P.PhaseCostCenter,
DT.[comments] as AssignmentType
FROM dbo.View_RptStudyModule S
JOIN [dbo].[View_RptPhase] P ON P.studyrowid = S.StudyRowId
JOIN (SELECT [datetypetxt],
[PhaseRowId],
[scheduledstartdate],
[comments] FROM [dbo].[View_Datetracked]
WHERE [scheduledstartdate] IS NOT NULL
AND [datetypetxt] = 'Dataset SAS Files'
AND comments LIKE '%SEND%'
AND [actualenddate] IS NULL) DT ON DT.PhaseRowId = P.phaserowid
WHERE
P.PhaseType = 'Dataset'
AND P.PhaseCostCenter IN ('1', '2', '3', '4', '5','6','7','8')
AND DT.[scheduledstartdate] > '2021-01-01'
AND S.FinalRptSign IS NULL
Group By
S.StudyNo,
S.Client,
S.FinalRptSign,
DT.scheduledstartdate,
P.PhaseCostCenter,
DT.[comments]
HAVING COUNT(*) = 1
AND MAX(DT.[comments]) = 'Final'
) SELECTED
JOIN dbo.View_RptStudyModule S ON S.StudyNo = SELECTED.StudyNo
Although your question title references DENSE_RANK()
it appears that your real question is, I purely want studyID's that have 1 AssignmentType which is "Final". I do not believe that DENSE_RANK()
is what you need.
To just get the studyID's, you can use a GROUP BY
together with a couple of HAVING
conditions to check for (1) Only one row for a given studyID and (2) AssignmentType= "Final"
Something like:
SELECT S.StudyNo
FROM View_RptStudyModule S
GROUP BY S.StudyNo
HAVING COUNT(*) = 1
AND MAX(S.comments) = 'Final'
This can be wrapped up as a subselect and included as part of a larger query.
If you need the complete row instead of just the ID (to avoid a separate lookup), you could also try either:
SELECT *
FROM View_RptStudyModule S
WHERE S.comments = 'Final'
AND NOT EXISTS(
SELECT *
FROM View_RptStudyModule S2
WHERE S2.StudyNo = S.StudyNo
AND S2.comments = 'Draft'
)
or
SELECT *
FROM (
SELECT *, COUNT(*) OVER(PARTITION BY S.StudyNo) AS NumRows
FROM View_RptStudyModule S
) SN
WHERE SN.NumRows = 1
AND SN.comments = 'Final'
These should all produce the same results except for a case where a study has multiple "Final" rows and but no "Draft" rows. If your data might contain such cases, you will need to decide whether or not those rows should be included in the results.
See this db<>fiddle for a demo. Since your original post did not include table schemas or sample data, the demo uses your original posted results as source data.