We are in the early stages of migrating some data from a legacy system to a new system that seems to have some quirks.
Currently we have some data that is as follows
Name Date Award
Roger 2024-06-01 A1
Roger 2024-06-01 A2
Roger 2024-07-01 A3
Alice 2024-06-01 A4
Alice 2024-07-01 A5
Aidan 2024-01-01 A6
I was trying to get the data into the following format
Name 2024-06-01 2024-06-01 2024-07-01 2024-01-01
Roger A1 A2 A3 -
Alice A4 - A5 -
Aidan - - - A6
Few issues with the data is that the number of rows can be variable. Thus when pivoting the number of columns can also be variable. The other issue is if I was trying to pivot on "Date" there can be times when an individual may get more then one award on a day, so the "Date" column could also be duplicated a few times.
I tried to do a search and found a few different solutions one via SQL (our data is coming from Oracle) and the next option was Excel Power Query.
When I searched for a SQL solution I found Oracle has the pivot function but this would require aggregation. I was then able to find maybe I could use a MAX or MIN function (conditional aggregation?) - But unfortunately this would cause some data to be lost -> For example with "Roger" on 2024-06-01 only one award would be in the output.
Looking for an Excel option I was able to find an article that suggested using Excel Power Query to pivot the table data. This seemed to work fine until I hit this row:
Name Date Award
Roger 2024-06-01 A1
Roger 2024-06-01 A2
It threw an error alluding to how it can not aggregate the row to column since there were 2 values of 2024-06-01. Again if I used MAX/MIN it would fix the error but then I would lose either the A1 or A2 award data.
I know Pivot is fundamentally meant to aggregate/group values so I am having a tough time finding how to get the desired output. Not sure if SQL or Excel would be the best way to go about things or if this is totally nonsensical request I would appreciate any ideas or input
In Oracle, you can use PIVOT
and LISTAGG
:
SELECT *
FROM table_name
PIVOT(
LISTAGG(award, ',')
FOR dt IN (
DATE '2024-01-01' AS "2024-01-01",
DATE '2024-06-01' AS "2024-06-01",
DATE '2024-07-01' AS "2024-07-01"
)
)
Which, for the sample data:
CREATE TABLE table_name (Name, DT, Award) AS
SELECT 'Roger', DATE '2024-06-01', 'A1' FROM DUAL UNION ALL
SELECT 'Roger', DATE '2024-06-01', 'A2' FROM DUAL UNION ALL
SELECT 'Roger', DATE '2024-07-01', 'A3' FROM DUAL UNION ALL
SELECT 'Alice', DATE '2024-06-01', 'A4' FROM DUAL UNION ALL
SELECT 'Alice', DATE '2024-07-01', 'A5' FROM DUAL UNION ALL
SELECT 'Aidan', DATE '2024-01-01', 'A6' FROM DUAL;
Outputs:
NAME | 2024-01-01 | 2024-06-01 | 2024-07-01 |
---|---|---|---|
Aidan | A6 | null | null |
Alice | null | A4 | A5 |
Roger | null | A1,A2 | A3 |
Or, numbering the rows before pivoting:
SELECT *
FROM (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY name, dt ORDER BY award) AS rn
FROM table_name t
)
PIVOT(
MAX(award)
FOR (dt, rn) IN (
(DATE '2024-01-01', 1) AS "2024-01-01_1",
(DATE '2024-06-01', 1) AS "2024-06-01_1",
(DATE '2024-06-01', 2) AS "2024-06-01_2",
(DATE '2024-07-01', 1) AS "2024-07-01_1"
)
)
Outputs:
NAME | 2024-01-01_1 | 2024-06-01_1 | 2024-06-01_2 | 2024-07-01_1 |
---|---|---|---|---|
Aidan | A6 | null | null | null |
Alice | null | A4 | null | A5 |
Roger | null | A1 | A2 | A3 |