I have 2 columns A & B in a SQL Server table that contain a string aggregated list of codes.
The codes in the lists are separated by a semicolon.
The string aggregated lists can take any length and the codes can be in any order.
What I would like to do is to create a new column C which contains the codes that appear somewhere in the same row of the lists of Column A and Column B.
Example:
Column_A | Column_B | Column_C |
---|---|---|
a;b;c;d;e | c;a;e;i;k | c;a;e |
d;e;f;g | e;h;i;j;d | e;d |
The example above returns "c", "a" and "e" for the first row of column C, because these codes are present in the same row of both Column A and Column B.
The same for the second row, here "e" and "d" are overlapping in Column A and B and thus returned in Column C.
I have tried something that works, but it does not seem like the best solution in terms of efficiency and performance. Especially because I have many (1m+) rows to check this for and the length of code lists to compare can be very long.
SELECT
STRING_AGG(CAST([value] AS NVARCHAR(MAX)),'; ') AS Overlapping_Code
FROM
(SELECT a.value
FROM MyTable t
CROSS APPLY STRING_SPLIT(t.Column_A, ';') a
INTERSECT
SELECT b.value
FROM MyTable t
CROSS APPLY STRING_SPLIT(t.Column_B, ';') b
) ab
I am looking for a better solution in terms of performance and elegance to compare the string aggregated lists for two columns across (many) rows.
Here is another possible solution:
CREATE TABLE MyTable (a VARCHAR(100) NOT NULL, b VARCHAR(100) NOT NULL /* PRIMARY KEY ?? */);
INSERT INTO MyTable VALUES ('a;b;c;d;e','c;a;e;i;k'),('d;e;f;g','e;h;i;j;d');
SELECT *,
(SELECT STRING_AGG(x.value,';')
FROM STRING_SPLIT(a,';') AS x
, STRING_SPLIT(b,';') AS z
WHERE x.value=z.value)
FROM mytable AS t;
a b c
---------- ---------- -------
a;b;c;d;e c;a;e;i;k c;a;e
d;e;f;g e;h;i;j;d e;d