sql-server t-sql delimiter charindex patindex

Split String based on position and delimiters

I am in need of assistance. A question about SQL on CHARINDEX, PATINDEX, SUBSTRING, LEFT and RIGHT.

I have strings that I need to split based on position and delimiter. Using some of it as Column name and the other part as data. and group this based on row id. And then sum the values if needed.

Sample data, ID, String

1    20:4:10:1:20:3:
2    20:1:
3    10:3:
4    30:4:40:1:50:3:

And the needed outcome is

id  10  20  30  40  50
1   1   7
2       1
3   3
4           4   1   3

Data:

CREATE TABLE #Split (ID int, SplitString varchar(450))
Insert into #Split (ID, SplitString) values 
(1, '20:4:10:1:20:3:'),
(2, '20:1:'),
(3, '10:3:'),
(4, '30:4:40:1:50:3:')

select * From #Split
DROP TABLE #Split

Any help would be greatly appreciated!

The data is also possible to be as such:

CREATE TABLE #Split (ID int, String1 varchar(10),  String1Quantity int, String2 varchar(10), String2Quantity int, String3 VARCHAR(10), String3Quantity int, String4 varchar(10), String4Quantity int, String5 varchar(10), String5Quantity int)

Insert into #Split (ID, String1, String1Quantity, String2, String2Quantity, String3, String3Quantity, String4, String4Quantity, String5, String5Quantity) values 
(1,'20',4,'10',1,'20',3, null, null, null, null),
(2,'20',1,null,null,null,null, null, null, null, null),
(3,'10',3,null,null,null,null, null, null, null, null),
(4,'30',4,'40',1,'50',3, null, null, null, null)

select * From #Split


DROP TABLE #Split

Solution

First, as I wrote in my comment: The solution for this is to fix your broken data model. For more information, read Is storing a delimited list in a database column really that bad?, where you will see a lot of reasons why the answer to this question is Absolutely yes!

That being said, I know that a lot of times, for many different reasons, changing the database structure is not an option, even though this would solve a multitude of problems.

Now, based on your string manipulation functions, I assume you're working with SQL-Server (Please note that for every sql question you should always specify the rdbms and version, or use database-agnostic if you really need a cross-database solution).

So based on your assumption, your best bet is probably to use Jeff Moden's DelimitedSplit8K UDF to break the string into a table.
This function will return a table with two columns - one for the substring, and one for it's index in the source string - which is exactly what you need to start constructing the results you're after.

So based on that function, a couple of common table expressions, pivot and conditional aggregation, I came up with the following solution:

WITH CTE1 AS
(
    SELECT  Id,
            CAST(IIF(ItemNumber % 2 = 0, Item, NULL) as int) As Data, 
            IIF(ItemNumber % 2 = 1, Item, NULL) As Name, 
            IIF(ItemNumber % 2 = 0, ItemNumber - 1, ItemNumber) As PairNumber
    FROM  #Split
    CROSS APPLY dbo.DelimitedSplit8K(SplitString, ':')
    WHERE Item IS NOT NULL
    AND Item <> ''
), CTE2 AS
(
    SELECT Id, PairNumber, MAX(Data) As Data, MAX(Name) As Name
    FROM CTE1
    GROUP BY Id, PairNumber
)

, CTEPivot AS
(
    SELECT Id, [10], [20], [30], [40], [50]
    FROM 
    (
    SELECT Id, SUM(Data) As Data, Name
    FROM CTE2 
    GROUP BY Id, Name
    ) D
    PIVOT  
    (  
    AVG(Data) 
    FOR Name IN ([10], [20], [30], [40], [50])  
    ) AS PivotTable
)

SELECT  Id, 
        MAX([10]) As [10], 
        MAX([20]) As [20], 
        MAX([30]) As [30], 
        MAX([40]) As [40], 
        MAX([50]) As [50] 
FROM CTEPivot
GROUP BY Id;

Results:

+----+----+----+----+----+----+
| Id | 10 | 20 | 30 | 40 | 50 |
+----+----+----+----+----+----+
| 1  | 1  | 7  |    |    |    |
+----+----+----+----+----+----+
| 2  |    | 1  |    |    |    |
+----+----+----+----+----+----+
| 3  | 3  |    |    |    |    |
+----+----+----+----+----+----+
| 4  |    |    | 4  | 1  | 3  |
+----+----+----+----+----+----+

You can see a live demo on rextester.