sqlsql-serverredgate

Update a SQL Column dynamically based on a substring of another column


I am using Microsoft SQL Server 2019 15.0.4360.2 (X64)

Lets say I have a table in my SQL server

Code Keyword Description Component_Keys
A1 Sally Sally : Mary is her Daughter
A2 Mary Mary : John is her sister and sally is her mother
A3 Jane Jane : John is her partner
A4 John John : Jane is his partner and Sally is his mother and Mary is his sister. Mary is not his partner

I would like to find a way to dynamically update Component_Keys such that it loops through each description, identifies an instance of the Keyword within the description , and records the Code in the column Component_Keys if it finds an instance of the Keyword. If a keyword appears more than once in the description, I would like it to appear only once in Component_Keys. If multiple keywords appear I would like it to be recorded as Code1+Code2+Code3.....

Lastly, I would like it to ignore the keyword in the description if it is in the same row.

As such the output of the update query would look like this

Code Keyword Description Component_Keys
A1 Sally Sally : Mary is her Daughter A2
A2 Mary Mary : John is her sister and sally is her mother A4+A1
A3 Jane Jane : John is her partner A4
A4 John John : Jane is his partner and Sally is his mother and Mary is his sister. Mary is not his partner A3+A1+A2

I have the following code so far

IF OBJECT_ID(N'tempdb..#temp') IS NOT NULL
BEGIN -- Checking to ensure temp table does not exist, and dropping anyone that does to avoid errors

    DROP TABLE #temp;
END;

GO

--- Create Temp Table to store values and manipulate before inserting into production table ----

CREATE TABLE #temp
(
Code VARCHAR(MAX)
, Keyword VARCHAR(MAX)
, Description VARCHAR(MAX)
, Component_Keys VARCHAR(MAX)
);

INSERT INTO #temp (Code
                 , Keyword
                 , Description
                 , Component_Keys)
VALUES ('A1', 'Sally', 'Sally : Mary is her Daughter', NULL)
     , ('A2', 'Mary', 'Mary : John is her sister and sally is her mother', NULL)
     , ('A3', 'Jane', 'Jane : John is her partner', NULL)
     , ('A4', 'John'
      , 'John : Jane is his partner and Sally is his mother and Mary is his sister. Mary is not his partner', NULL);
SELECT *
FROM   #temp;


-- Creating Variables
DECLARE @DKeyword VARCHAR(MAX); -- Declaring Dynamic Keyword Variable
DECLARE @DUpdateStatement VARCHAR(MAX); ---Declaring Dynamic Update Statement Variable
DECLARE @DCode VARCHAR(MAX) -- Declaring Dynamic Code Variable

SELECT @DKeyword= Keyword, @DCode=code
FROM #temp

UPDATE #temp SET Component_Keys = @DUpdateStatement

I am unsure of how to proceed from here on the following:

I don't really understand how to draft the update statement variable here such that

  1. It ignores the keyword if it is in the same row
  2. Ignores the duplicates of the same key word
  3. Formats it in the desired format (A1+A2+A3....)

Solution

  • For each row, you need to:

    1. Identify all relatives having names contained in the current row's description.
    2. Concatenate the those keys using "+" as a separator.
    3. Update the Component_Keys column with the result.

    The first step requires a subquery to examine all rows from a second reference to the source table and check if the Key is contained in the main rows's Description. The CHARINDEX() function is convenient for this purpose.

    The STRING_AGG() function will accomplish the second step. The WITHIN GROUP(ORDER BY ...) clause is used to define the order of the components, if order is important. Here I used the position from the first step.

    UPDATE T
    SET Component_Keys = (
        SELECT STRING_AGG(T2.Code, '+') WITHIN GROUP(ORDER BY P.Pos)
        FROM #temp T2
        CROSS APPLY(SELECT CHARINDEX(T2.Keyword, T.Description) AS Pos) P
        WHERE T2.Code <> T.Code
        AND P.Pos >= 1
    )
    FROM #temp T
    

    You can also use a CROSS APPLY to separate the calculation from the SET clause, which some may find more readable.

    UPDATE T
    SET Component_Keys = C.CalculatedKeys
    FROM #temp T
    CROSS APPLY (
        SELECT STRING_AGG(T2.Code, '+') WITHIN GROUP(ORDER BY P.Pos) AS CalculatedKeys
        FROM #temp T2
        CROSS APPLY(SELECT CHARINDEX(T2.Keyword, T.Description) AS Pos) P
        WHERE T2.Code <> T.Code
        AND P.Pos >= 1
    ) C
    

    Both have the same results:

    Code Keyword Description Component_Keys
    A1 Sally Sally : Mary is her Daughter A2
    A2 Mary Mary : John is her sister and sally is her mother A4+A1
    A3 Jane Jane : John is her partner A4
    A4 John John : Jane is his partner and Sally is his mother and Mary is his sister. Mary is not his partner A3+A1+A2

    See this db<>fiddle for a demo.

    Note that text processing functionality in SQL Server is very limited. Ideally, instead of searching just for "Ron", you might prefer to only match whole words using the regular expression "\bRon\b", where "\b" matches word-boundaries, so that similar names or words like "Ronald" and "Sharon" are not matched. Unfortunately, SQL Server does not support regular expressions.

    As a workaround, you could replace all punctuation with spaces, prepend, and append extra spaces and search for " Ron ", but this starts to get much more complex.