Business:
I have one manager tied to multiple employees/single employees across different business and Manager and employees are in different rows. We have to map on ID and also on zip codes (both zip5 and zip3).If there is no match on ID, map it on zip5 and if there is no match then on zip3. Manager is main table.
Code:
I tried to segregate manager and employee information into two different tables and then try to left join employee with manager. I have to join on ID Or zip5 or zip3 and business. I tried below logic but the join conditions are messing up the data because of nulls. I also tried max, but it wont work when there are multiple employees for same manager.
DROP TABLE IF EXISTS #table;
create table #table
(ID int,
zip5 int,
zip3 int,
Manager_Region varchar(50),
manager_Name varchar(100),
Employee_name varchar(50),
Employee_region varchar(100),
business varchar(20),
source varchar(20)
)
Insert into #table
select 1234,null,null,'East','Mike',null,null,'Healthcare','Source1'
UNION ALL
select 1234,null,null,null,null,'John','NorthEast','Healthcare','Source1'
UNION ALL
select 1234,null,null,null,null,'Patty','EastCentral','Healthcare','Source1'
UNION ALL
select Null,34589,null,'West','Kat',Null,Null,'dental','Source2'
UNION ALL
select Null,34589,null,null,Null,'Kim','NorthWestern','dental','Source2'
UNION ALL
select Null,null,345,'South','rita',Null,Null,'Healthcare','Source3'
UNION ALL
select Null,null,345,null,Null,'Sam','southCentral','Healthcare','Source3'
DROP TABLE IF EXISTS #Manager_Is_Null;
select * into #Manager_Is_Null
from #table
where Manager_Region is null
DROP TABLE IF EXISTS #Employee_is_Null;
select * into #Employee_is_Null
from #table
where Employee_Region is null
select s.ID,s.zip5,s.zip3,
s.business,
s.Manager_Region,
S.Manager_Name,
a.employee_Region,
a.employee_Name
from #Employee_is_Null s
left join #Manager_Is_Null a
on ((isnull(a.ID,'') = isnull(s.ID,'')) OR
(isnull(a.ID,'') <> isnull(s.ID,'') AND isnull(a.zip5,'') = isnull(s.zip5,'')) OR
(isnull(a.ID,'') <> isnull(s.ID,'') AND isnull(a.zip5,'') <> isnull(s.zip5,'') AND isnull(a.zip3,'') = isnull(a.zip3,'')))
and a.business=s.business
Expected Output:
ID | zip5 | zip3 | Manager_Region | Manager_Name | Employee_Name | Employee_Region | Business | source |
---|---|---|---|---|---|---|---|---|
1234 | null | null | East | Mike | John | NorthEast | Healthcare | Source1 |
1234 | null | null | East | Mike | Patty | EastCentral | Healthcare | Source1 |
null | 34589 | null | West | Kat | Kim | NorthWestern | dental | Source2 |
null | null | 345 | South | rita | Sam | southCentral | Healthcare | Source3 |
How can I acheive this resultset?
This kind of problem (getting only one "best match" per row to output) can be solved by using window functions, and in particular row_number() over (…)
(introduced on SQL Server 2005).
row_number() over (partition by criteria1, criteria2 order by criteria3, criteria4, criteria5)
will compute on the server (and return to you), for each row of the result set (after join
s and where
have been applied), what its position would be in a sub-resultset consisting only of rows with the same criteria1
and criteria2
, ordered by criteria3, criteria4, criteria5
.
Thus in your case, after joining employees to managers,
if criteria1, criteria2
was your employee's identifier (ID and name?),
and order by criteria3, criteria4, criteria5
was an order by [do IDs match?], [do zip5s match?], [do zip3s match?]
,
then potential employee - manager combinations would be sorted (within each employee) with first employers having the same ID, then employers with different ID but same zip5, then those with different ID and zip5 but same zip3.
And your row_number()
would be the position of the employee - employer combination within the combinations for this employee: thus each employee would have its best matching employer with row_number() = 1
.
However you can't then filter on the results of row_number()
through the where
(remember I said earlier that window functions where computed after the where
?),
so you'll have to first ask for the full resultset, with its row_number()
-generated column, in a subquery (or Common Table Expression), then have a wrapping query to keep only the rows with row_number() = 1
.
Last hint: you don't need *_Is_Not_Null
tables, you can use a self-join of #table
(#table
joined to itself, but each occurrence with a different alias: #table as employee left join #table as manager
) and use on
and where
to ensure the employee
alias only contains employees, and employers
only employers.
Your query would then be:
with
scored_matches as
(
select
e.ID,e.zip5,e.zip3,
e.business,
m.Manager_Region,
m.Manager_Name,
e.employee_Region,
e.employee_Name,
row_number() over
(
partition by e.ID, e.employee_Name
order by
case when m.ID = e.ID then 0 end,
case when m.zip5 = e.zip5 then 0 end,
case when m.zip3 = e.zip3 then 0 end
) match_pos
from #table e
left join #table m on m.Manager_name is not null and
(
m.ID = e.ID
or m.zip5 = e.zip5
or m.zip3 = e.zip3
)
where e.Employee_name is not null
)
select * from scored_matches
where match_pos = 1;
which returns:
ID | zip5 | zip3 | business | Manager_Region | Manager_Name | employee_Region | employee_Name | match_pos |
---|---|---|---|---|---|---|---|---|
null | 34589 | null | dental | West | Kat | NorthWestern | Kim | 1 |
null | null | 345 | Healthcare | South | rita | southCentral | Sam | 1 |
1234 | null | null | Healthcare | East | Mike | NorthEast | John | 1 |
1234 | null | null | Healthcare | East | Mike | EastCentral | Patty | 1 |
(as generated as the last query of this fiddle, on SQL Server 2014 (2012 is really too old, I can't test on it))
top 1
Another similar solution that I won't expand here is to use a select top 1
subquery:
select …
from employees
cross apply (select top 1 … from employers where … order by …) as employers;
which in essence means: for each employee, join to employers but keep only the first employer row (after the order by
, thus the best match) matching the employee (the employee-centered selection being done through the where
, with clauses like employer.ID = employee.ID or …
).
While applying the general method above (which makes sure there is only one manager per employee), I noticed that every employee - manager combination was ranked 1.
Which means that in your dataset you already have no ambiguity (there are no 2 "manager candidates" for any employee, for example 1 matching the ID
and 1 super-manager matching the zip5
, with a need to select the most direct manager (the ID
-matching one) over the other).
In that case where there is at most 1 manager matching ID
or zip5
or zip3
you don't need the window function and its post-filtering in a wrapping query, you simply can take the inner query:
select
e.ID,e.zip5,e.zip3,
e.business,
m.Manager_Region,
m.Manager_Name,
e.employee_Region,
e.employee_Name
from #table e
left join #table m on m.Manager_name is not null and
(
m.ID = e.ID
or m.zip5 = e.zip5
or m.zip3 = e.zip3
)
where e.Employee_name is not null;
(see it added at the end of this fiddle)
But I would not recommend it, as it relies heavily on your business constraints (no 2 possible managers per employee) being maintained in your database; on the contrary, row_number()
robustly ensures that, even if an employee accidentally has 2 managers, only 1 will get through the filter (even if they have the same level of matching, for example both have the same ID, row_number()
will return different values, arbitrarily choosing one as 1
and the other as 2
).