I have this query:
SELECT
c.ID, c.Firstname, c.lastname, c.BDaY, c.gender,
cp.code, cp.Citizenship, r.race, e.ethnicity
FROM
Client AS C (nolock)
JOIN
Citizenship AS cp (nolock) ON c.ID = cp.client_ID
JOIN
Race AS r (nolock) ON c.ID = R.Client_ID
JOIN
Ethnicity AS E (nolock) ON E.Client_ID = c.ID
This query will return some of the client's names duplicated because they have different race and ethnicity.
Example:
ID |FirstName|Lastname| BDay | gender | code |citizenship| race | ethnicity
1 Pedram Salamati 01-20-1998 M 1 US citizen Middle-east Spanish
1 Pedram Salamati 01-20-1998 M 1 US Citizen Middle-east unknown
1 Pedram Salamati 01-20-1998 M 1 US Citizen Middle-east Brazilian
2 Jesse Albert 03-05-1982 F 1 US Citizen African not Spanish
2 Jesse Albert 03-05-1982 F 1 US Citizen American not Spanish
I was wondering if there is any way to say if race is not = than Race should be Multiracial and if ethnicity Is not = to each other for same Id choose the last updated one.
PS.Ethnicity
has time stamp and I can use Max(e.LastUpdate)
I think maybe a sub query can help!
Any help or thought will be much appreciated!
Here is some test data to mimic your environment in the future you should separate the tables involved and test data. including DML statements is appropriate and helpful as well so people can try their solution prior to answering.
DECLARE @Client AS TABLE (ID INT, Firstname VARCHAR(25), LastName VARCHAR(25), BDay DATE, Gender CHAR(1))
INSERT INTO @Client VALUES (1,'Pedram','Salamati','01-20-1998','M')
,(2,'Jesse','Albert','03-05-1982','F')
DECLARE @Citizenship AS TABLE (Client_ID INT, Code INT, Citizenship VARCHAR(100))
INSERT INTO @Citizenship VALUES (1,1,'US citizen'),(2,1,'US citizen')
DECLARE @Ethnicity AS TABLE (Client_ID INT, Ethnicity VARCHAR(50))
INSERT INTO @Ethnicity VALUES (1,'Spanish'),(1,'unknown'),(1,'Brazilian'),(2,'not Spanish')
DECLARE @Race AS TABLE (Client_Id INT, Race VARCHAR(50), LastUpdate DATETIME)
INSERT INTO @Race VALUES (1,'Middle-east',GETDATE()),(2,'African',GETDATE()),(2,'American',GETDATE() -1)
With those variables you can do the following, there of course is more than 1 way this is simply 1 way I am choosing for a few reasons:
;WITH cteEthnicity AS (
SELECT
e.Client_ID
,CASE WHEN COUNT(DISTINCT e.Ethnicity) > 1 THEN 'Multiracial' ELSE MIN(e.Ethnicity) END as Ethnicity
FROM
@Ethnicity e
GROUP BY
e.Client_ID
)
, cteRace AS (
SELECT
r.Client_Id
,r.Race
,ROW_NUMBER() OVER (PARTITION BY r.Client_Id ORDER BY r.LastUpdate DESC) as RowNumber
FROM
@Race r
)
SELECT
c.ID
,c.Firstname
,c.lastname
,c.BDaY
,c.gender
,cp.code
,cp.Citizenship
,r.race
,e.ethnicity
From
@Client AS C --(nolock)
Join @Citizenship as cp --(nolock)
on c.ID = cp.client_ID
Join cteRace as r --(nolock)
ON c.ID = R.Client_ID
AND r.RowNumber = 1
Join cteEthnicity as E --(nolock)
ON E.Client_ID = c.ID
You displayed 2 issue 1 with race and 1 with ethnicity
For Ethnicity: you want to use aggregation to determine which ethnicity to assign. this can also be done with a window function but the way I wrote it here it will account for duplicates to exist even in the Ethnicity table.
For Race: you simply want the latest row partitioned by client you can use the ROW_NUMBER() function to generate that and then select where it equals 1 in the join statement
A third issue that you didn't point out but could be possible in some countries anyway is DUAL CITIZENSHIP. In that case you could use a method similar to that of Race.
Note even though Common Table Expressions [CTE] are used you can actually nest those as subselect as well.