I have data that was sent to me, and I need to normalize it. The data is in a sql table, but each row has multiple multi value columns. An example is the following:
ID fname lname projects projdates
1 John Doe projA;projB;projC 20150701;20150801;20150901
2 Jane Smith projD;;projC 20150701;;20150902
3 Lisa Anderson projB;projC 20150801;20150903
4 Nancy Johnson projB;projC;projE 20150601;20150822;20150904
5 Chris Edwards projA 20150905
Needs too look like this:
ID fname lname projects projdates
1 John Doe projA 20150701
1 John Doe projB 20150801
1 John Doe projC 20150901
2 Jane Smith projD 20150701
2 Jane Smith projC 20150902
3 Lisa Anderson projB 20150801
3 Lisa Anderson projC 20150903
4 Nancy Johnson projB 20150601
4 Nancy Johnson projC 20150822
4 Nancy Johnson projE 20150904
5 Chris Edwards projA 20150905
I need to split it into rows for the id, fname, lname, and parsing the projects and proddates into separate records. I have found many posts with split functions and I can get it to work for 1 column, but not 2. When I do 2 columns it permeates the split. ie for John Doe, it gives me records for projA 3 times, once for each of the proddates. I need to coorelate each multivalue project record with only it's respective projdate and not the others.
Any thoughts?
Thanks!
If you use Jeff Moden's "DelimitedSplit8K" splitter (Which I have renamed here "fDelimitedSplit8K")
(Ref. Figure 21: The Final "New" Splitter Code, Ready for Testing)
to do the heavy lifting for the splits, the rest becomes fairly straightforward, using CROSS APPLY and WHERE to get the proper joining.
IF object_ID (N'tempdb..#tInputData') is not null
DROP TABLE #tInputData
CREATE TABLE #tInputData (
ID INT
PRIMARY KEY CLUSTERED -- Add IDENTITY if ID needs to be set at INSERT time
, FName VARCHAR (30)
, LName VARCHAR (30)
, Projects VARCHAR (4000)
, ProjDates VARCHAR (4000)
)
INSERT INTO #tInputData
( ID, FName, LName, Projects, ProjDates )
VALUES
( 1, 'John', 'Doe' , 'projA;projB;projC' , '20150701;20150801;20150901'),
( 2, 'Jane', 'Smith' , 'projD;;projC' , '20150701;;20150902'),
( 3, 'Lisa', 'Anderson' , 'projB;projC' , '20150801;20150903'),
( 4, 'Nancy', 'Johnson' , 'projB;projC;projE' , '20150601;20150822;20150904'),
( 5, 'Chris', 'Edwards' , 'projA' , '20150905')
SELECT * FROM #tInputData -- Take a look at the INSERT results
; WITH ResultSet AS
(
SELECT
InData.ID
, InData.FName
, InData.LName
, ProjectList.ItemNumber AS ProjectID
, ProjectList.Item AS Project
, DateList.ItemNumber AS DateID
, DateList.Item AS ProjDate
FROM #tInputData AS InData
CROSS APPLY dbo.fDelimitedSplit8K(InData.Projects,';') AS ProjectList
CROSS APPLY dbo.fDelimitedSplit8K(InData.ProjDates,';') AS DateList
WHERE DateList.ItemNumber = ProjectList.ItemNumber -- Links projects and dates in left-to-r1ght order
AND (ProjectList.Item <> '' AND DateList.Item <> '') -- Ignore input lines when both Projects and ProjDates have no value; note that these aren't NULLs.
)
SELECT
ID
, FName
, LName
, Project
, ProjDate
FROM ResultSet
ORDER BY ID, Project
Results in
ID FName LName Project ProjDate
-- ----- -------- ------- --------
1 John Doe projA 20150701
1 John Doe projB 20150801
1 John Doe projC 20150901
2 Jane Smith projC 20150902
2 Jane Smith projD 20150701
3 Lisa Anderson projB 20150801
3 Lisa Anderson projC 20150903
4 Nancy Johnson projB 20150601
4 Nancy Johnson projC 20150822
4 Nancy Johnson projE 20150904
5 Chris Edwards projA 20150905
This algorithm handles Project and Date lists of equal length. Should one list be shorter than the other for a given row, some special attention will be needed to apply the NULL in the proper place.
-- Cleanup
DROP TABLE #tInputData