I have a source dataset; it has data for four employees and their departments based on an effective date.
I need to convert this source dataset to the destination dataset.
Both datasets are properly sorted by EmployeeName and EffectiveDate (ASC) order.
(Please see T-SQL scripts using temp table.)
CREATE TABLE #source
(
EmployeeName varchar(100),
EffectiveDate date,
CurrentDepartment varchar(100)
);
INSERT INTO #source
VALUES
('Lisa','2017-06-25','Catering'),
('Lisa','2018-08-17',NULL),
('Lisa','2021-12-05','Gardening'),
('Melissa','2015-08-27',NULL),
('Melissa','2017-11-29','Office'),
('Melissa','2020-10-10','Driving'),
('Melissa','2022-07-11',NULL),
('Omar','2019-01-03',NULL),
('Omar','2020-04-07','Retail'),
('Omar','2021-03-29',NULL),
('Pat', '2012-09-12','Laundry'),
('Pat', '2013-10-30',NULL),
('Pat', '2014-11-29',NULL),
('Pat', '2015-08-16',NULL),
('Pat', '2016-11-05',NULL)
CREATE TABLE #destination
(
EmployeeName varchar(100),
EffectiveDate date,
CurrentDepartment varchar(100),
PreviousNonNULLDepartmentIfAvailable varchar(100)
);
INSERT INTO #destination
VALUES
('Lisa','2017-06-25','Catering',NULL),
('Lisa','2018-08-17',NULL,'Catering'),
('Lisa','2021-12-05','Gardening','Catering'),
('Melissa','2015-08-27',NULL,NULL),
('Melissa','2017-11-29','Office',NULL),
('Melissa','2020-10-10','Driving','Office'),
('Melissa','2022-07-11',NULL,'Driving'),
('Omar','2019-01-03',NULL,NULL),
('Omar','2020-04-07','Retail',NULL),
('Omar','2021-03-29',NULL,'Retail'),
('Pat', '2012-09-12','Laundry',NULL),
('Pat', '2013-10-30',NULL,'Laundry'),
('Pat', '2014-11-29',NULL,'Laundry'),
('Pat', '2015-08-16',NULL,'Laundry'),
('Pat', '2016-11-05',NULL,'Laundry')
SELECT *
FROM #source
ORDER BY EmployeeName, EffectiveDate
SELECT *
FROM #destination
ORDER BY EmployeeName, EffectiveDate
In the destination dataset, I need one new column called [PreviousNonNULLDepartmentIfAvailable].
What is the logic to derive this above new column?
I need to get each individual's most recent (previous) department; it is easy to use a LAG function to get the most recent (previous) department. See T-SQL code below:
PreviousNonNULLDepartmentIfAvailable = LAG(CurrentDepartment) OVER(PARTITION BY EmployeeName ORDER BY EffectiveDate)
However, I need the most recent (previous) non-NULL department; if there is no such "most recent (previous) non-NULL" department value within the PARTITION of EmployeeName, then I need to show NULL.
I have tried options such as LAG, LAST_VALUE, IGNORE NULLS clause, UNBOUNDED PRECEDING clause. These options are close to what I need, but NOT exactly what I need.
Effectively, I need to get what a LAG function would perform; but the offset value for LAG function has to be dynamic, instead of a static value such as 1 or 2 or 3...; the LAG function needs to iterate (backwards) as many rows as needed to catch the most recent (previous) non-NULL department value, within a PARTITION of EmployeeName.
This said, the column [PreviousNonNULLDepartmentIfAvailable] can still have NULL values, if there is no such "most recent (previous) non-NULL" department value available within a PARTITION of EmployeeName.
Also, the first row based on ascending order of Effective Date of each partition of EmployeeName will always have NULL as its [PreviousNonNULLDepartmentIfAvailable] value (obviously). This is natural in the way LAG function works.
Any idea on how to convert the source dataset to destination dataset ?
SELECT
a.*,
c.PreviousNonBlankCurrentDepartment AS PreviousNonNULLDepartmentIfAvailable
FROM #source a
OUTER APPLY
(
SELECT TOP 1 b.CurrentDepartment AS PreviousNonBlankCurrentDepartment
FROM #source b
WHERE b.EmployeeName = a.EmployeeName
AND b.EffectiveDate < a.EffectiveDate
AND b.CurrentDepartment IS NOT NULL
ORDER BY b.EffectiveDate DESC
) c