sql-servert-sqllagleadpartition-by

Get previous non-NULL value based on LAG function with dynamic offset


I have a source dataset; it has data for four employees and their departments based on an effective date.

I need to convert this source dataset to the destination dataset.

enter image description here

Both datasets are properly sorted by EmployeeName and EffectiveDate (ASC) order.

(Please see T-SQL scripts using temp table.)

CREATE TABLE #source

(

EmployeeName varchar(100),
EffectiveDate date,
CurrentDepartment varchar(100)

);

INSERT INTO #source

VALUES

('Lisa','2017-06-25','Catering'),
('Lisa','2018-08-17',NULL),
('Lisa','2021-12-05','Gardening'),
('Melissa','2015-08-27',NULL),
('Melissa','2017-11-29','Office'),
('Melissa','2020-10-10','Driving'),
('Melissa','2022-07-11',NULL),
('Omar','2019-01-03',NULL),
('Omar','2020-04-07','Retail'),
('Omar','2021-03-29',NULL),
('Pat', '2012-09-12','Laundry'),
('Pat', '2013-10-30',NULL),
('Pat', '2014-11-29',NULL),
('Pat', '2015-08-16',NULL),
('Pat', '2016-11-05',NULL)


CREATE TABLE #destination

(

EmployeeName varchar(100),
EffectiveDate date,
CurrentDepartment varchar(100),
PreviousNonNULLDepartmentIfAvailable varchar(100)

);

INSERT INTO #destination

VALUES

('Lisa','2017-06-25','Catering',NULL),
('Lisa','2018-08-17',NULL,'Catering'),
('Lisa','2021-12-05','Gardening','Catering'),
('Melissa','2015-08-27',NULL,NULL),
('Melissa','2017-11-29','Office',NULL),
('Melissa','2020-10-10','Driving','Office'),
('Melissa','2022-07-11',NULL,'Driving'),
('Omar','2019-01-03',NULL,NULL),
('Omar','2020-04-07','Retail',NULL),
('Omar','2021-03-29',NULL,'Retail'),
('Pat', '2012-09-12','Laundry',NULL),
('Pat', '2013-10-30',NULL,'Laundry'),
('Pat', '2014-11-29',NULL,'Laundry'),
('Pat', '2015-08-16',NULL,'Laundry'),
('Pat', '2016-11-05',NULL,'Laundry')



SELECT *
FROM #source
ORDER BY EmployeeName, EffectiveDate


SELECT *
FROM #destination
ORDER BY EmployeeName, EffectiveDate

In the destination dataset, I need one new column called [PreviousNonNULLDepartmentIfAvailable].

What is the logic to derive this above new column?

I need to get each individual's most recent (previous) department; it is easy to use a LAG function to get the most recent (previous) department. See T-SQL code below:

PreviousNonNULLDepartmentIfAvailable = LAG(CurrentDepartment) OVER(PARTITION BY EmployeeName ORDER BY EffectiveDate)

However, I need the most recent (previous) non-NULL department; if there is no such "most recent (previous) non-NULL" department value within the PARTITION of EmployeeName, then I need to show NULL.

I have tried options such as LAG, LAST_VALUE, IGNORE NULLS clause, UNBOUNDED PRECEDING clause. These options are close to what I need, but NOT exactly what I need.

Effectively, I need to get what a LAG function would perform; but the offset value for LAG function has to be dynamic, instead of a static value such as 1 or 2 or 3...; the LAG function needs to iterate (backwards) as many rows as needed to catch the most recent (previous) non-NULL department value, within a PARTITION of EmployeeName.

This said, the column [PreviousNonNULLDepartmentIfAvailable] can still have NULL values, if there is no such "most recent (previous) non-NULL" department value available within a PARTITION of EmployeeName.

Also, the first row based on ascending order of Effective Date of each partition of EmployeeName will always have NULL as its [PreviousNonNULLDepartmentIfAvailable] value (obviously). This is natural in the way LAG function works.

Any idea on how to convert the source dataset to destination dataset ?


Solution

  • SELECT
    
    a.*,
    
    c.PreviousNonBlankCurrentDepartment AS PreviousNonNULLDepartmentIfAvailable
    
    FROM #source a
    
    OUTER APPLY
    
    (
    
    SELECT TOP 1 b.CurrentDepartment AS PreviousNonBlankCurrentDepartment
    
    FROM #source b
    WHERE b.EmployeeName = a.EmployeeName
    AND b.EffectiveDate < a.EffectiveDate
    AND b.CurrentDepartment IS NOT NULL
    
    ORDER BY b.EffectiveDate DESC
    
    ) c