sqlparsingsplitmultivalue

SQL Split Multiple Multivalue Columns into Rows


I have data that was sent to me, and I need to normalize it. The data is in a sql table, but each row has multiple multi value columns. An example is the following:

ID  fname   lname       projects           projdates
1   John    Doe         projA;projB;projC  20150701;20150801;20150901
2   Jane    Smith       projD;;projC       20150701;;20150902
3   Lisa    Anderson    projB;projC        20150801;20150903
4   Nancy   Johnson     projB;projC;projE  20150601;20150822;20150904
5   Chris   Edwards     projA              20150905

Needs too look like this:

ID  fname   lname      projects projdates
1   John    Doe          projA  20150701
1   John    Doe          projB  20150801
1   John    Doe          projC  20150901
2   Jane    Smith        projD  20150701
2   Jane    Smith        projC  20150902
3   Lisa    Anderson     projB  20150801
3   Lisa    Anderson     projC  20150903
4   Nancy   Johnson      projB  20150601
4   Nancy   Johnson      projC  20150822
4   Nancy   Johnson      projE  20150904
5   Chris   Edwards      projA  20150905

I need to split it into rows for the id, fname, lname, and parsing the projects and proddates into separate records. I have found many posts with split functions and I can get it to work for 1 column, but not 2. When I do 2 columns it permeates the split. ie for John Doe, it gives me records for projA 3 times, once for each of the proddates. I need to coorelate each multivalue project record with only it's respective projdate and not the others.

Any thoughts?

Thanks!


Solution

  • If you use Jeff Moden's "DelimitedSplit8K" splitter (Which I have renamed here "fDelimitedSplit8K") (Ref. Figure 21: The Final "New" Splitter Code, Ready for Testing)
    to do the heavy lifting for the splits, the rest becomes fairly straightforward, using CROSS APPLY and WHERE to get the proper joining.

    IF object_ID (N'tempdb..#tInputData') is not null 
       DROP TABLE #tInputData
    
    CREATE TABLE #tInputData (
         ID        INT 
            PRIMARY KEY CLUSTERED  -- Add IDENTITY if ID needs to be set at INSERT time
       , FName     VARCHAR (30)
       , LName     VARCHAR (30)
       , Projects  VARCHAR (4000)
       , ProjDates VARCHAR (4000)
    )
    
    INSERT INTO #tInputData
             ( ID, FName, LName, Projects, ProjDates )
    VALUES
       ( 1, 'John',  'Doe'      , 'projA;projB;projC' , '20150701;20150801;20150901'),
       ( 2, 'Jane',  'Smith'    , 'projD;;projC'      , '20150701;;20150902'),
       ( 3, 'Lisa',  'Anderson' , 'projB;projC'       , '20150801;20150903'),
       ( 4, 'Nancy', 'Johnson'  , 'projB;projC;projE' , '20150601;20150822;20150904'),
       ( 5, 'Chris', 'Edwards'  , 'projA'             , '20150905')
    
    SELECT * FROM #tInputData  -- Take a look at the INSERT results
    
    ; WITH ResultSet  AS 
    (
       SELECT 
            InData.ID
          , InData.FName
          , InData.LName
          , ProjectList.ItemNumber AS ProjectID
          , ProjectList.Item AS Project
          , DateList.ItemNumber AS DateID
          , DateList.Item AS ProjDate
       FROM #tInputData AS InData
       CROSS APPLY dbo.fDelimitedSplit8K(InData.Projects,';') AS ProjectList
       CROSS APPLY dbo.fDelimitedSplit8K(InData.ProjDates,';') AS DateList
       WHERE DateList.ItemNumber = ProjectList.ItemNumber  -- Links projects and dates in left-to-r1ght order
       AND (ProjectList.Item <> '' AND DateList.Item <> '') -- Ignore input lines when both Projects and ProjDates have no value; note that these aren't NULLs.
    )
    SELECT 
          ID
        , FName
        , LName
        , Project
        , ProjDate 
    FROM ResultSet
    ORDER BY ID, Project
    

    Results in

    ID  FName  LName     Project  ProjDate  
    --  -----  --------  -------  --------  
     1  John   Doe       projA    20150701  
     1  John   Doe       projB    20150801  
     1  John   Doe       projC    20150901  
     2  Jane   Smith     projC    20150902  
     2  Jane   Smith     projD    20150701  
     3  Lisa   Anderson  projB    20150801  
     3  Lisa   Anderson  projC    20150903  
     4  Nancy  Johnson   projB    20150601  
     4  Nancy  Johnson   projC    20150822  
     4  Nancy  Johnson   projE    20150904  
     5  Chris  Edwards   projA    20150905  
    

    This algorithm handles Project and Date lists of equal length. Should one list be shorter than the other for a given row, some special attention will be needed to apply the NULL in the proper place.

    -- Cleanup
    DROP TABLE #tInputData