sqlsql-serversubstringpatindex

Extract String for specific characters SQL Server


Trying to extract specific string from a varchar column in SQL Server. And then from that string I need the last 4 digits only.

I tried substring & patindex but can't figure out how to get the last 4 digits.

This was one attempt but it only returns the first part of the text:

SUBSTRING([column], (PATINDEX('%ID0%-[0-9][0-9][0-9][0-9][0-9]%',[column])),9) 

Here's some sample data:

Brian Larry, Hays Hill XX, ID007370, Option code - 0123 4567, AAA - XX
ID007366 - Dave Jones - XX - Option Code 0121 9999
ID00 7120  Brian Smith XX  Branded company
ID07113 Gary Barnes  - LLL 0123 9111 AAA LLL
ID00 7120  Charles Old XX   Recall operation
ID0007439 - Kerry Hill - Maidstone XX - Option Code 0124 234 BBB XX
ID006817 Paul George Jackson 1234 8464 AAA Recall operation

I need to extract the ID00000 number, but then only return the last 4 digits. So for ID007370, I only want to return 7370. The ID numbers can vary in length.


Solution

  • Something like this looks to be working:

    SELECT  RIGHT(SUBSTRING(string_nonspace, y.start, PATINDEX('%[^0-9]%', STUFF(string_nonspace, 1, start + 1, '')) + 1), 4)
    FROM    (
        VALUES  (N'Brian Larry, Hays Hill XX, ID007370, Option code - 0123 4567, AAA - XX')
        ,   (N'ID007366 - Dave Jones - XX - Option Code 0121 9999')
        ,   (N'ID00 7120  Brian Smith XX  Branded company')
        ,   (N'ID07113 Gary Barnes  - LLL 0123 9111 AAA LLL')
        ,   (N'ID00 7120  Charles Old XX   Recall operation')
        ,   (N'ID0007439 - Kerry Hill - Maidstone XX - Option Code 0124 234 BBB XX')
        ,   (N'ID006817 Paul George Jackson 1234 8464 AAA Recall operation')
    ) t (col1)
    CROSS APPLY (
            SELECT  replace(col1, ' ', '') AS string_nonspace
        ) x
    CROSS APPLY (
            SELECT  PATINDEX('%ID[0-9]%', string_nonspace) AS start
        ) y
    

    Output:

    ID string_nonspace
    7370 BrianLarry,HaysHillXX,ID007370,Optioncode-01234567,AAA-XX
    7366 ID007366-DaveJones-XX-OptionCode01219999
    7120 ID007120BrianSmithXXBrandedcompany
    7113 ID07113GaryBarnes-LLL01239111AAALLL
    7120 ID007120CharlesOldXXRecalloperation
    7439 ID0007439-KerryHill-MaidstoneXX-OptionCode0124234BBBXX
    6817 ID006817PaulGeorgeJackson12348464AAARecalloperation

    I remove spaces and then find starting position by PATINDEX('%ID[0-9]%'. Then i clip the string so ID-part is removed, and then search for first non-number value. When that done, you can do a SUBSTRING between start and the first non-number value. Finally, RIGHT(..., 4) gets the last four characters.

    Of course this code will fail on many malformed strings, but them the breaks, the in-data kinda "blowers"