sqlsql-serversql-server-2019

Remove substring between two characters in SQL Column


Lets say I have a table T with a column called S

Code S
1 \String\ String
2 Laser \Laser\
3 La\ser\s and \S\trings

I want to remove every substring in each row of T.S that is between \ and \ and replace it with nothing such that Table T becomes

Code S
1 string
2 Laser
3 Las and trings

I have figured out how to identify the characters between the first instance of \ and \

IF OBJECT_ID(N'tempdb..#t') IS NOT NULL
BEGIN -- Checking to ensure temp table does not exist, and dropping anyone that does to avoid errors

    DROP TABLE #t;
END;

GO

--- Create Temp Table to store values and manipulate before inserting into production table ----

CREATE TABLE #t
(
Code VARCHAR(MAX)
,S VARCHAR(MAX)
);

INSERT INTO #t (Code ,S )
VALUES ('1','\String\ String'),
        ('2','Laser \Laser\'),
        ('3', 'La\ser\s and \S\trings')
SELECT *
FROM   #t;


SELECT REPLACE(REPLACE(SUBSTRING(s, start_pos,  end_pos - start_pos  
    +1), '\', '\'), '\', '\')

FROM   (SELECT s,
               CHARINDEX('\', s) AS start_pos, 
               CHARINDEX('\', s, CHARINDEX('\', s) + 1) AS end_pos
        FROM   #t) t 

This returns

S
\String\
\Laser\
\ser\

Where I am stuck is on

  1. How do I get it to apply to all instances of \ \ in the same line individually (see line 3)

  2. Apply this change to the column in an update (I'm thinking a cross apply might be useful here)


Solution

  • If this were SQL Server 2022 (or later) we could:

    1. Use STRING_SPLIT(,,1) to break up the string at the \ delimiters and assign each part an ordinal value.
    2. Filter out the even substrings (leaving the odd).
    3. Reassemble the remaining parts using STRING_AGG().
    4. Add a TRIM() to clean up leading or trailing whitespace.

    Something like:

    SELECT
        T.Code,
        TRIM(STRING_AGG(SS.value, '') WITHIN GROUP(ORDER BY SS.ordinal)) AS Answer
    FROM #t T
    CROSS APPLY STRING_SPLIT(T.S, '\', 1) SS
    WHERE SS.ordinal % 2 != 0 -- Keep the odd parts
    GROUP BY T.Code
    

    The third parameter of STRING_SPLIT() generates an ordinal along side each value (available in SQL Server 2022 and later). This is referenced in the WITHIN GROUP(...) clause of the STRING_AGG() function to guarantee the proper ordering or the reassembled components.

    If this is part of a more complicated query, you can encapsulate the logic within a CROSS APPLY whose result can then be referenced elsewhere.

    SELECT
        T.Code,
        CA.Answer
    FROM #t T
    CROSS APPLY (
        SELECT TRIM(STRING_AGG(SS.value, '') WITHIN GROUP(ORDER BY SS.ordinal)) AS Answer
        FROM STRING_SPLIT(T.S, '\', 1) SS
        WHERE SS.ordinal % 2 != 0 -- Keep the odd parts
    ) CA
    ORDER BY T.Code
    

    For SQL Server 2019, the STRING_SPLIT() function does not support the enable_ordinal option, so this function cannot be used when we need to guarantee order. An alternative is to use a technique that first maps the string into a JSON array, and then uses OPENJSON() to parse that array.

    SELECT
        T.Code,
        CA.Answer
    FROM #t T
    CROSS APPLY (
        SELECT TRIM(STRING_AGG(SS.value, '') WITHIN GROUP(ORDER BY SS.ordinal)) AS Answer
        FROM (
            SELECT J.value, CONVERT(int, [key]) + 1 AS ordinal
            FROM OPENJSON(CONCAT('["', REPLACE(STRING_ESCAPE(T.S, 'json'), '\\', '","'), '"]')) J
        ) SS
        WHERE SS.ordinal % 2 != 0 -- Keep the odd parts
    ) CA
    ORDER BY T.Code
    

    The above will convert the source string into a JSON array that is then parsed into individual elements by OPENJSON(). The STRING_ESCAPE() function protects the JSON against certain special characters (particularly any embedded double quotes) that would otherwise cause an error. The key value returned by OPENJSON() above is a zero-based string that must be converted to an integer and offset by 1 to match the STRING_SPLIT() ordinal.

    NOTE: There are several STRING_SPLIT() alternatives out there, but few yield a reliable ordinal value. (There are actually several bad answers out there that use post-split numbering logic that appears to work, but whose results are not guaranteed by any documented feature, and which may break at any time - especially when scaled up.)

    The results are the same for each of the above:

    Code Answer
    1 String
    2 Laser
    3 Las and trings

    See this db<>fiddle for a demo.

    An update in place can be performed with the following:

    UPDATE T
    SET S = CA.Answer
    FROM #t T
    CROSS APPLY (
        SELECT TRIM(STRING_AGG(SS.value, '') WITHIN GROUP(ORDER BY SS.ordinal)) AS Answer
        FROM (
            SELECT J.value, CONVERT(int, [key]) + 1 AS ordinal
            FROM OPENJSON(CONCAT('["', REPLACE(STRING_ESCAPE(T.S, 'json'), '\\', '","'), '"]')) J
        ) SS
        WHERE SS.ordinal % 2 != 0 -- Keep the odd parts
    ) CA
    

    Or if you prefer to include the calculation directly in the select statement:

    UPDATE T
    SET S = (
        SELECT TRIM(STRING_AGG(SS.value, '') WITHIN GROUP(ORDER BY SS.ordinal))
        FROM (
            SELECT J.value, CONVERT(int, [key]) + 1 AS ordinal
            FROM OPENJSON(CONCAT('["', REPLACE(STRING_ESCAPE(T.S, 'json'), '\\', '","'), '"]')) J
        ) SS
        WHERE SS.ordinal % 2 != 0 -- Keep the odd parts
    )
    FROM #t T
    

    See this updated db<>fiddle.