Multiple rows aggregated to single row

Business:
I have one manager tied to multiple employees/single employees across different business and Manager and employees are in different rows. We have to map on ID and also on zip codes (both zip5 and zip3).If there is no match on ID, map it on zip5 and if there is no match then on zip3. Manager is main table.

Code:
I tried to segregate manager and employee information into two different tables and then try to left join employee with manager. I have to join on ID Or zip5 or zip3 and business. I tried below logic but the join conditions are messing up the data because of nulls. I also tried max, but it wont work when there are multiple employees for same manager.

DROP TABLE IF EXISTS #table;
create table #table
(ID int,
zip5 int,
zip3 int,
Manager_Region varchar(50),
manager_Name varchar(100),
Employee_name varchar(50),
Employee_region varchar(100),
business varchar(20),
source varchar(20)
)

Insert into #table
select 1234,null,null,'East','Mike',null,null,'Healthcare','Source1'
UNION ALL
select 1234,null,null,null,null,'John','NorthEast','Healthcare','Source1'
UNION ALL
select 1234,null,null,null,null,'Patty','EastCentral','Healthcare','Source1'
UNION ALL
select Null,34589,null,'West','Kat',Null,Null,'dental','Source2'
UNION ALL
select Null,34589,null,null,Null,'Kim','NorthWestern','dental','Source2'
UNION ALL
select Null,null,345,'South','rita',Null,Null,'Healthcare','Source3'
UNION ALL
select Null,null,345,null,Null,'Sam','southCentral','Healthcare','Source3'


DROP TABLE IF EXISTS #Manager_Is_Null;
select * into #Manager_Is_Null
from #table 
where  Manager_Region is null


DROP TABLE IF EXISTS #Employee_is_Null;
select * into #Employee_is_Null
from #table 
where  Employee_Region is null 


select s.ID,s.zip5,s.zip3,
s.business,
s.Manager_Region,
S.Manager_Name,
a.employee_Region,
a.employee_Name
from #Employee_is_Null s
left join #Manager_Is_Null a
on ((isnull(a.ID,'') = isnull(s.ID,'')) OR
        (isnull(a.ID,'') <> isnull(s.ID,'') AND isnull(a.zip5,'') = isnull(s.zip5,'')) OR
        (isnull(a.ID,'') <> isnull(s.ID,'') AND isnull(a.zip5,'') <> isnull(s.zip5,'') AND isnull(a.zip3,'') = isnull(a.zip3,'')))
        and a.business=s.business

Expected Output:

ID	zip5	zip3	Manager_Region	Manager_Name	Employee_Name	Employee_Region	Business	source
1234	null	null	East	Mike	John	NorthEast	Healthcare	Source1
1234	null	null	East	Mike	Patty	EastCentral	Healthcare	Source1
null	34589	null	West	Kat	Kim	NorthWestern	dental	Source2
null	null	345	South	rita	Sam	southCentral	Healthcare	Source3

How can I acheive this resultset?

Solution

Using window functions

This kind of problem (getting only one "best match" per row to output) can be solved by using window functions, and in particular row_number() over (…) (introduced on SQL Server 2005).

row_number() over (partition by criteria1, criteria2 order by criteria3, criteria4, criteria5) will compute on the server (and return to you), for each row of the result set (after joins and where have been applied), what its position would be in a sub-resultset consisting only of rows with the same criteria1 and criteria2, ordered by criteria3, criteria4, criteria5.

Thus in your case, after joining employees to managers,
if criteria1, criteria2 was your employee's identifier (ID and name?),
and order by criteria3, criteria4, criteria5 was an order by [do IDs match?], [do zip5s match?], [do zip3s match?],
then potential employee - manager combinations would be sorted (within each employee) with first employers having the same ID, then employers with different ID but same zip5, then those with different ID and zip5 but same zip3.
And your row_number() would be the position of the employee - employer combination within the combinations for this employee: thus each employee would have its best matching employer with row_number() = 1.

However you can't then filter on the results of row_number() through the where (remember I said earlier that window functions where computed after the where?),
so you'll have to first ask for the full resultset, with its row_number()-generated column, in a subquery (or Common Table Expression), then have a wrapping query to keep only the rows with row_number() = 1.

Last hint: you don't need *_Is_Not_Null tables, you can use a self-join of #table (#table joined to itself, but each occurrence with a different alias: #table as employee left join #table as manager) and use on and where to ensure the employee alias only contains employees, and employers only employers.

Your query would then be:

with
  scored_matches as
  (
    select
      e.ID,e.zip5,e.zip3,
      e.business,
      m.Manager_Region,
      m.Manager_Name,
      e.employee_Region,
      e.employee_Name,
      row_number() over
      (
        partition by e.ID, e.employee_Name
        order by
          case when m.ID = e.ID then 0 end,
          case when m.zip5 = e.zip5 then 0 end,
          case when m.zip3 = e.zip3 then 0 end
      ) match_pos
    from #table e
    left join #table m on m.Manager_name is not null and
    (
         m.ID = e.ID
      or m.zip5 = e.zip5
      or m.zip3 = e.zip3
    )
    where e.Employee_name is not null
  )
select * from scored_matches
where match_pos = 1;

which returns:

ID	zip5	zip3	business	Manager_Region	Manager_Name	employee_Region	employee_Name	match_pos
null	34589	null	dental	West	Kat	NorthWestern	Kim	1
null	null	345	Healthcare	South	rita	southCentral	Sam	1
1234	null	null	Healthcare	East	Mike	NorthEast	John	1
1234	null	null	Healthcare	East	Mike	EastCentral	Patty	1

(as generated as the last query of this fiddle, on SQL Server 2014 (2012 is really too old, I can't test on it))

With `top 1`

Another similar solution that I won't expand here is to use a select top 1 subquery:

select …
from employees
cross apply (select top 1 … from employers where … order by …) as employers;

which in essence means: for each employee, join to employers but keep only the first employer row (after the order by, thus the best match) matching the employee (the employee-centered selection being done through the where, with clauses like employer.ID = employee.ID or …).

By carefully joining

While applying the general method above (which makes sure there is only one manager per employee), I noticed that every employee - manager combination was ranked 1.
Which means that in your dataset you already have no ambiguity (there are no 2 "manager candidates" for any employee, for example 1 matching the ID and 1 super-manager matching the zip5, with a need to select the most direct manager (the ID-matching one) over the other).

In that case where there is at most 1 manager matching ID or zip5 or zip3 you don't need the window function and its post-filtering in a wrapping query, you simply can take the inner query:

    select
      e.ID,e.zip5,e.zip3,
      e.business,
      m.Manager_Region,
      m.Manager_Name,
      e.employee_Region,
      e.employee_Name
    from #table e
    left join #table m on m.Manager_name is not null and
    (
         m.ID = e.ID
      or m.zip5 = e.zip5
      or m.zip3 = e.zip3
    )
    where e.Employee_name is not null;

(see it added at the end of this fiddle)

But I would not recommend it, as it relies heavily on your business constraints (no 2 possible managers per employee) being maintained in your database; on the contrary, row_number() robustly ensures that, even if an employee accidentally has 2 managers, only 1 will get through the filter (even if they have the same level of matching, for example both have the same ID, row_number() will return different values, arbitrarily choosing one as 1 and the other as 2).

Multiple rows aggregated to single row

Using window functions

With top 1

By carefully joining

With `top 1`