I am using Microsoft SQL Server 2019 15.0.4360.2 (X64)
Lets say I have a table in my SQL server
Code | Keyword | Description | Component_Keys |
---|---|---|---|
A1 | Sally | Sally : Mary is her Daughter | |
A2 | Mary | Mary : John is her sister and sally is her mother | |
A3 | Jane | Jane : John is her partner | |
A4 | John | John : Jane is his partner and Sally is his mother and Mary is his sister. Mary is not his partner |
I would like to find a way to dynamically update Component_Keys such that it loops through each description, identifies an instance of the Keyword within the description , and records the Code in the column Component_Keys if it finds an instance of the Keyword. If a keyword appears more than once in the description, I would like it to appear only once in Component_Keys. If multiple keywords appear I would like it to be recorded as Code1+Code2+Code3.....
Lastly, I would like it to ignore the keyword in the description if it is in the same row.
As such the output of the update query would look like this
Code | Keyword | Description | Component_Keys |
---|---|---|---|
A1 | Sally | Sally : Mary is her Daughter | A2 |
A2 | Mary | Mary : John is her sister and sally is her mother | A4+A1 |
A3 | Jane | Jane : John is her partner | A4 |
A4 | John | John : Jane is his partner and Sally is his mother and Mary is his sister. Mary is not his partner | A3+A1+A2 |
I have the following code so far
IF OBJECT_ID(N'tempdb..#temp') IS NOT NULL
BEGIN -- Checking to ensure temp table does not exist, and dropping anyone that does to avoid errors
DROP TABLE #temp;
END;
GO
--- Create Temp Table to store values and manipulate before inserting into production table ----
CREATE TABLE #temp
(
Code VARCHAR(MAX)
, Keyword VARCHAR(MAX)
, Description VARCHAR(MAX)
, Component_Keys VARCHAR(MAX)
);
INSERT INTO #temp (Code
, Keyword
, Description
, Component_Keys)
VALUES ('A1', 'Sally', 'Sally : Mary is her Daughter', NULL)
, ('A2', 'Mary', 'Mary : John is her sister and sally is her mother', NULL)
, ('A3', 'Jane', 'Jane : John is her partner', NULL)
, ('A4', 'John'
, 'John : Jane is his partner and Sally is his mother and Mary is his sister. Mary is not his partner', NULL);
SELECT *
FROM #temp;
-- Creating Variables
DECLARE @DKeyword VARCHAR(MAX); -- Declaring Dynamic Keyword Variable
DECLARE @DUpdateStatement VARCHAR(MAX); ---Declaring Dynamic Update Statement Variable
DECLARE @DCode VARCHAR(MAX) -- Declaring Dynamic Code Variable
SELECT @DKeyword= Keyword, @DCode=code
FROM #temp
UPDATE #temp SET Component_Keys = @DUpdateStatement
I am unsure of how to proceed from here on the following:
I don't really understand how to draft the update statement variable here such that
For each row, you need to:
Component_Keys
column with the result.The first step requires a subquery to examine all rows from a second reference to the source table and check if the Key
is contained in the main rows's Description
. The CHARINDEX()
function is convenient for this purpose.
The STRING_AGG()
function will accomplish the second step. The WITHIN GROUP(ORDER BY ...)
clause is used to define the order of the components, if order is important. Here I used the position from the first step.
UPDATE T
SET Component_Keys = (
SELECT STRING_AGG(T2.Code, '+') WITHIN GROUP(ORDER BY P.Pos)
FROM #temp T2
CROSS APPLY(SELECT CHARINDEX(T2.Keyword, T.Description) AS Pos) P
WHERE T2.Code <> T.Code
AND P.Pos >= 1
)
FROM #temp T
You can also use a CROSS APPLY
to separate the calculation from the SET
clause, which some may find more readable.
UPDATE T
SET Component_Keys = C.CalculatedKeys
FROM #temp T
CROSS APPLY (
SELECT STRING_AGG(T2.Code, '+') WITHIN GROUP(ORDER BY P.Pos) AS CalculatedKeys
FROM #temp T2
CROSS APPLY(SELECT CHARINDEX(T2.Keyword, T.Description) AS Pos) P
WHERE T2.Code <> T.Code
AND P.Pos >= 1
) C
Both have the same results:
Code | Keyword | Description | Component_Keys |
---|---|---|---|
A1 | Sally | Sally : Mary is her Daughter | A2 |
A2 | Mary | Mary : John is her sister and sally is her mother | A4+A1 |
A3 | Jane | Jane : John is her partner | A4 |
A4 | John | John : Jane is his partner and Sally is his mother and Mary is his sister. Mary is not his partner | A3+A1+A2 |
See this db<>fiddle for a demo.
Note that text processing functionality in SQL Server is very limited. Ideally, instead of searching just for "Ron", you might prefer to only match whole words using the regular expression "\bRon\b", where "\b" matches word-boundaries, so that similar names or words like "Ronald" and "Sharon" are not matched. Unfortunately, SQL Server does not support regular expressions.
As a workaround, you could replace all punctuation with spaces, prepend, and append extra spaces and search for " Ron ", but this starts to get much more complex.