software im using is snowflake
I have a table of an employees job assignments with the company. They can hold multiple jobs actively at once (null in end_date
). I'd like to consolidate this into 1 row:
EMPLOYEE_ID | START_DATE | END_DATE | JOB_TITLE |
---|---|---|---|
1442 | 7/30/24 | Tutor | |
1442 | 7/30/24 | Tutor | |
1442 | 6/28/24 | Instructional Specialist | |
1442 | 5/1/24 | 6/27/24 | Instructional Specialist |
1442 | 12/16/21 | 7/29/24 | Tutor |
1442 | 12/16/21 | Lead Instructor | |
1442 | 12/16/21 | 7/29/24 | Tutor |
If an employee has any null values in the end_date
field, then Id like to only retrieve distinct job_titles (eliminate top 2 rows to 1 because both same job on same start date)
1-5 in desc order based on start_date
like this:
EMPLOYEE_ID | Job_Title_1 | Job_Title_2 | Job_Title_3 | Job_Title_4 | Job_Title_5 |
---|---|---|---|---|---|
1442 | Tutor | Instructional Specialist | Lead Instructor |
now lets say this employee had no currently active jobs, the table would look like this:
EMPLOYEE_ID | START_DATE | END_DATE | JOB_TITLE |
---|---|---|---|
1442 | 5/1/24 | 6/27/24 | Instructional Specialist |
1442 | 12/16/21 | 7/29/24 | Tutor |
1442 | 12/16/21 | 7/29/24 | Tutor |
in that case I'd like the table to look like this:
EMPLOYEE_ID | Job_Title_1 | Job_Title_2 | Job_Title_3 | Job_Title_4 | Job_Title_5 |
---|---|---|---|---|---|
1442 | Instructional Specialist | Tutor |
Here is the query I am using, and it works, but it's not ordering the job_title 1-5 columns by desc start_date order:
WITH job_position_info_ADP AS (
SELECT
'ADP' AS source,
CAST(w.associate_oid AS STRING) AS worker_id,
CAST(w.id AS STRING) AS Employee_ID,
TO_CHAR(wah._fivetran_start, 'MM/DD/YY') AS start_date,
CASE
WHEN wah._fivetran_active = TRUE THEN NULL
ELSE TO_CHAR(wah._fivetran_end, 'MM/DD/YY')
END AS end_date,
wah.job_title AS Job_Title,
ROW_NUMBER() OVER (PARTITION BY CAST(w.id AS STRING) ORDER BY wah._fivetran_start DESC) AS rn
FROM
prod_raw.adp_workforce_now.worker w
JOIN
prod_raw.adp_workforce_now.worker_report_to AS wr
ON w.id = wr.worker_id
JOIN
prod_raw.adp_workforce_now.work_assignment_history AS wah
ON w.id = wah.worker_id
),
recent_jobs_with_null_end AS (
SELECT
Employee_ID,
Job_Title,
ROW_NUMBER() OVER (PARTITION BY Employee_ID ORDER BY start_date DESC) AS rn
FROM
job_position_info_ADP
WHERE
end_date IS NULL
),
recent_jobs_all AS (
SELECT
Employee_ID,
Job_Title,
ROW_NUMBER() OVER (PARTITION BY Employee_ID ORDER BY start_date DESC) AS rn
FROM
job_position_info_ADP
)
SELECT
Employee_ID,
MAX(CASE WHEN rn = 1 THEN Job_Title END) AS Job_Title_1,
MAX(CASE WHEN rn = 2 THEN Job_Title END) AS Job_Title_2,
MAX(CASE WHEN rn = 3 THEN Job_Title END) AS Job_Title_3,
MAX(CASE WHEN rn = 4 THEN Job_Title END) AS Job_Title_4,
MAX(CASE WHEN rn = 5 THEN Job_Title END) AS Job_Title_5
FROM (
SELECT * FROM recent_jobs_with_null_end
UNION ALL
SELECT * FROM recent_jobs_all
WHERE Employee_ID NOT IN (SELECT Employee_ID FROM recent_jobs_with_null_end)
) AS combined
WHERE
Employee_ID = '1442'
GROUP BY
Employee_ID;
This is a replicable code in SQL Server, including schema and data.
The idea was to reorganize the work_assignment_history table to order the job positions by date without including the current positions (END_DATE is null).
Then, in job_unique, duplicates are removed, and the line number is recalculated. It was discovered that the initial line number is different after removing duplicates, and the date transformed into characters makes it difficult to calculate the new line number.
Finally, the PIVOT operation is applied to convert rows into columns.
;WITH job_position_info_ADP AS (
SELECT
CAST(wah.EMPLOYEE_ID AS varchar(10)) AS Employee_ID,
wah.START_DATE AS start_date,
wah.job_title AS Job_Title,
ROW_NUMBER() OVER (PARTITION BY wah.EMPLOYEE_ID ORDER BY wah.START_DATE DESC) AS rn
FROM
[development].[dbo].work_assignment_history AS wah
WHERE
wah.END_DATE is not null
),
job_unique as
(
select Employee_ID, start_date, Job_Title, max(rn) as rn,
ROW_NUMBER() OVER (PARTITION BY EMPLOYEE_ID ORDER BY start_date DESC) AS new_rn
from job_position_info_ADP
group by Employee_ID, start_date, Job_Title
)
SELECT Employee_ID, [1], [2], [3], [4],[5]
FROM
(
SELECT Employee_ID, job_title,new_rn
FROM job_unique
) AS SourceTable
PIVOT
(
MAX(job_title)
FOR new_rn IN ([1], [2], [3], [4],[5])
) AS PivotTable;
Output
Employee_ID 1 2 3 4 5
1442 Especialista en Instrucción Tutor NULL NULL NULL
I hope it is easy to translate to Snowflake, as most SQL languages have PIVOT operations with syntax changes.
Schema
CREATE TABLE [dbo].[work_assignment_history ](
[EMPLOYEE_ID] [smallint] NOT NULL,
[START_DATE] [date] NOT NULL,
[END_DATE] [date] NULL,
[JOB_TITLE] [nvarchar](50) NOT NULL
) ON [PRIMARY]
GO
Data
EMPLOYEE_ID;START_DATE;END_DATE;JOB_TITLE
1442;7/30/24;;Tutor
1442;7/30/24;;Tutor
1442;6/28/24;;Especialista en Instrucción
1442;5/1/24;6/27/24;Especialista en Instrucción
1442;12/16/21;7/29/24;Tutor
1442;12/16/21;;Instructor Principal
1442;12/16/21;7/29/24;Tutor
Note To understand the question and develop a solution, the data was translated into Spanish. My apologies.