sqlsql-serversql-server-2019

Remove variable length prefix of a string


In the query shown here, I need to keep the numerical part of the column code1 when it starts with T and concatenate it with code2 to get the code I want.

With other RDBMS, I would have used regex replace.

I came up with the solution below. Is it optimal? Are there better solutions?

SELECT 
    code1,
    code2,
    CONCAT(ISNULL(CAST(TRY_CAST(REPLACE(code1, 'T', '')  AS int) AS nvarchar), code1), code2) AS concatenated_code
FROM 
    (VALUES
        ('T000001524','A001'),
        ('T000001530','A001'), 
        ('S01','A001'),
        ('T000003705','A001'),
        ('T000000091','A001'),
        ('S09','A004'),
        ('T000000961','A002')
    ) customer (code1, code2)

The output I am looking for is

code1 code2 concatenated_code
T000001524 A001 1524A001
T000001530 A001 1530A001
S01 A001 S01A001
T000003705 A001 3705A001
T000000091 A001 91A001
S09 A004 S09A004
T000000961 A002 961A002

Solution

  • Assuming you're on SQL Server 2022 (as you don't denote you aren't using the latest version), you could just use a trim function (LTRIM) to remove the leading 'T' and '0' characters:

    SELECT code1,
           code2,
           CONCAT(LTRIM(customer.code1,'T0'),customer.code2) AS concatenated_code
    FROM (VALUES ('T000001524', 'A001'),
                 ('T000001530', 'A001'),
                 ('S01', 'A001'),
                 ('T000003705', 'A001'),
                 ('T000000091', 'A001'),
                 ('S09', 'A004'),
                 ('T000000961', 'A002')) customer (code1, code2);
    

    Otherwise, I would probably use STUFF and PATINDEX to remove the characters to the first non-T/0 character.

    SELECT code1,
           code2,
           CONCAT(STUFF(customer.code1,1,PATINDEX('%[^T0]%',customer.code1)-1,''),customer.code2) AS concatenated_code
    FROM (VALUES ('T000001524', 'A001'),
                 ('T000001530', 'A001'),
                 ('S01', 'A001'),
                 ('T000003705', 'A001'),
                 ('T000000091', 'A001'),
                 ('S09', 'A004'),
                 ('T000000961', 'A002')) customer (code1, code2);
    

    If you don't have a non-T/0 character in your value, you could concatenate on at the end of the search expression: customer.code1+'x'

    As for what is "optimal", you're in the best position to test.