sqlexceloraclepivotpowerquery

How to transpose subset of columns into rows and group by remainder of rows? Excel? SQL?


We are in the early stages of migrating some data from a legacy system to a new system that seems to have some quirks.

Currently we have some data that is as follows

Name      Date      Award
Roger  2024-06-01     A1
Roger  2024-06-01     A2
Roger  2024-07-01     A3
Alice  2024-06-01     A4
Alice  2024-07-01     A5
Aidan  2024-01-01     A6

I was trying to get the data into the following format

Name   2024-06-01  2024-06-01  2024-07-01  2024-01-01
Roger       A1         A2          A3          -
Alice       A4          -          A5          -
Aidan       -           -          -           A6

Few issues with the data is that the number of rows can be variable. Thus when pivoting the number of columns can also be variable. The other issue is if I was trying to pivot on "Date" there can be times when an individual may get more then one award on a day, so the "Date" column could also be duplicated a few times.

I tried to do a search and found a few different solutions one via SQL (our data is coming from Oracle) and the next option was Excel Power Query.

When I searched for a SQL solution I found Oracle has the pivot function but this would require aggregation. I was then able to find maybe I could use a MAX or MIN function (conditional aggregation?) - But unfortunately this would cause some data to be lost -> For example with "Roger" on 2024-06-01 only one award would be in the output.

Looking for an Excel option I was able to find an article that suggested using Excel Power Query to pivot the table data. This seemed to work fine until I hit this row:

Name      Date      Award
Roger  2024-06-01     A1
Roger  2024-06-01     A2

It threw an error alluding to how it can not aggregate the row to column since there were 2 values of 2024-06-01. Again if I used MAX/MIN it would fix the error but then I would lose either the A1 or A2 award data.

I know Pivot is fundamentally meant to aggregate/group values so I am having a tough time finding how to get the desired output. Not sure if SQL or Excel would be the best way to go about things or if this is totally nonsensical request I would appreciate any ideas or input


Solution

  • In Oracle, you can use PIVOT and LISTAGG:

    SELECT *
    FROM   table_name
           PIVOT(
             LISTAGG(award, ',')
             FOR dt IN (
               DATE '2024-01-01' AS "2024-01-01",
               DATE '2024-06-01' AS "2024-06-01",
               DATE '2024-07-01' AS "2024-07-01"
             )
           )
    

    Which, for the sample data:

    CREATE TABLE table_name (Name, DT, Award) AS
    SELECT 'Roger',  DATE '2024-06-01', 'A1' FROM DUAL UNION ALL
    SELECT 'Roger',  DATE '2024-06-01', 'A2' FROM DUAL UNION ALL
    SELECT 'Roger',  DATE '2024-07-01', 'A3' FROM DUAL UNION ALL
    SELECT 'Alice',  DATE '2024-06-01', 'A4' FROM DUAL UNION ALL
    SELECT 'Alice',  DATE '2024-07-01', 'A5' FROM DUAL UNION ALL
    SELECT 'Aidan',  DATE '2024-01-01', 'A6' FROM DUAL;
    

    Outputs:

    NAME 2024-01-01 2024-06-01 2024-07-01
    Aidan A6 null null
    Alice null A4 A5
    Roger null A1,A2 A3

    Or, numbering the rows before pivoting:

    SELECT *
    FROM   (
              SELECT t.*,
                     ROW_NUMBER() OVER (PARTITION BY name, dt ORDER BY award) AS rn
              FROM   table_name t
           )
           PIVOT(
             MAX(award)
             FOR (dt, rn) IN (
               (DATE '2024-01-01', 1) AS "2024-01-01_1",
               (DATE '2024-06-01', 1) AS "2024-06-01_1",
               (DATE '2024-06-01', 2) AS "2024-06-01_2",
               (DATE '2024-07-01', 1) AS "2024-07-01_1"
             )
           )
    

    Outputs:

    NAME 2024-01-01_1 2024-06-01_1 2024-06-01_2 2024-07-01_1
    Aidan A6 null null null
    Alice null A4 null A5
    Roger null A1 A2 A3

    fiddle