sqlsql-serverjoinstored-proceduresgreatest-n-per-group

Getting only one row of data per primary key when there are many foreign keys that equals it


I have a procedure where I am trying to fetch the only on row of data per foreign key (FK) per (PK) in a table (turning a one-to-many relationship into a one-to-one table).

I have some tables that looks like below. NOTE: my tables do not actually look like this, these are just examples.

COMPANY

CompanyNo_PK CompanyName
1 Foo
2 Bar
3 FooBar

JOB ROLE

JobRoleNo_PK RoleName
1 Manager
2 IT
3 Clerk

EMPLOYEE

EmployeeNo_PK EmployeeName CompanyNo_FK JobRoleNo_FK DateHired
1 Steve 1 1 1/3/2018
2 Bob 2 1 1/9/2021
3 Daisy 3 1 3/7/2020
4 John 1 2 2/9/2019
5 Sam 1 3 5/17/2022
6 Kathy 1 1 6/25/2022
7 Cait 3 3 9/6/2020
8 Adam 3 1 8/3/2021
9 Mario 2 2 4/30/2020

My procedure should output a table using the tables above to only see the MOST CURRENT Manager for each Company.

So the output should look something like this:

CURRENT COMPANY MANAGERS

CompanyName EmployeeName DateHired
Foo Kathy 6/25/2022
Bar Bob 1/9/2021
FooBar Adam 8/3/2021

I tried the following query below, but I did not get the result I wanted.

CREATE PROCEDURE [Current Company Managers] 
AS
    SELECT DISTINCT 
        c.[CompanyName], e.[EmployeeName], e.[DateHired]
    FROM  
        [Company] c 
    LEFT JOIN 
        [Employee] e ON e.CompanyNo_FK = c.CompanyNoPK 
                     AND JobRoleNo_FK = 1
                     AND (NOT EXISTS (SELECT 1 
                                      FROM [Employee] ee 
                                      WHERE c.CompanyNo_PK = ee.CompanyNo_FK 
                                        AND ee.DateHired > e.DateHired)
                          OR NOT EXISTS (SELECT 1 
                                         FROM [Employee] ee 
                                         WHERE c.CompanyNo_PK = ee.CompanyNo_FK 
                                           AND ee.DateHired < e.DateHired))

I included the OR NOT EXISTS operation since there were cases where some current managers were not being listed in the view when they should be or the managers being listed were incorrect (mainly the older entries).

However, even with that, the results would returns the wrong entries:

CompanyName EmployeeName DateHired
Foo Kathy 6/25/2022
Bar Bob 1/9/2021
FooBar Daisy 3/7/2020

I have multiple entries per company:

CompanyName EmployeeName DateHired
Foo Kathy 6/25/2022
Bar Bob 1/9/2021
FooBar Adam 8/3/2021
FooBar Daisy 3/7/2020,

but I'm also missing some entries:

CompanyName EmployeeName DateHired
Bar Bob 1/9/2021
FooBar Adam 8/3/2021

or some combination of the three. So explanation for why this is happening will be much appreciated.


Solution

  • You want exactly one employee per company, and only the most recent manager hired. You can figure out the latter two using a TOP (1) inside a correlated APPLY.

    SELECT c.CompanyName, e.* FROM Company AS c
      OUTER APPLY 
      (SELECT TOP (1) EmployeeName, DateHired
        FROM Employee AS e
      INNER JOIN JobRole AS j
      ON e.JobRoleNo_FK = j.JobRoleNo_PK
      WHERE j.RoleName = N'Manager'
        AND e.CompanyNo_FK = c.CompanyNo_PK
      ORDER BY DateHired DESC) AS e;
    

    Output:

    CompanyName EmployeeName DateHired
    Foo Kathy 2022-06-25
    Bar Bob 2021-01-09
    FooBar Adam 2021-08-03

    This will include companies that don't have any managers (if you want to filter those out, use CROSS APPLY instead of OUTER APPLY).

    If you know the job role number already, or have already passed it in from somewhere, you can simplify a little:

    SELECT c.CompanyName, e.* FROM Company AS c
      OUTER APPLY (SELECT TOP (1) EmployeeName, DateHired
        FROM Employee AS e
      WHERE e.JobRoleNo_FK = 1
        AND e.CompanyNo_FK = c.CompanyNo_PK
      ORDER BY DateHired DESC) AS e;
    

    You can use a CTE approach too, I just find it a lot more complicated to write:

    WITH cte AS
    (
      SELECT c.CompanyName, e.EmployeeName, e.DateHired, 
          LatestRow = ROW_NUMBER() OVER 
            (PARTITION BY c.CompanyNo_PK 
             ORDER BY e.DateHired DESC)
        FROM Company AS c
        LEFT OUTER JOIN Employee AS e
        ON c.CompanyNo_PK = e.CompanyNo_FK
        AND e.JobRoleNo_FK = 1 -- or @JobRoleNo param
      )
    SELECT CompanyName, EmployeeName, DateHired
      FROM cte
      WHERE LatestRow = 1;