excelregexvbaparentheses

Regex - Matching an optional group of text within parentheses including parenthese themselves - while grouping


Am unable to modify my code to enable it to detect and include an optional block of text within parentheses

Am trying to extract text and am successful with my first version of code which is working as below.

Code:

([A-Z]{2,}-\d+)\(?\s.?([a-zA-Z]+\s?[a-zA-Z]+)\s\d*-\d*\s([a-zA-Z]+\s?[a-zA-Z]+\s?[a-zA-Z]+)

Data:

Design No. TAHN-7075 Natural Gray 0997-101 White 0911-101

Output:

TAHN-7075
Natural Gray
White

But it fails on parenthesis when the data is as below as the parentheses with data have appeared here

Data:

Design No. TAHN-7082(CS-02) Natural Gray 0997-101 Natural Gray 0997-101

I tried below code which matches my data above but creates an extra group and also fails on all other text

Code:

([A-Z]{2,}-\d+\(([^\)]+)\))\(?\s.?([a-zA-Z]+\s?[a-zA-Z]+)\s\d*-\d*\s([a-zA-Z]+\s?[a-zA-Z]+\s?[a-zA-Z]+)

Output:

TAHN-7082(CS-02)
CS-02
Natural Gray
Natural Gray

I need help to make my regex code match both type of data set i.e. with or without the block of parentheses.

Am currently trying it on a regex testing website but will port it to VBA once done.

Please let me know if I should provide any further information.

Best regards


Solution

  • You can omit the .? as that causes missing the first character and omit the single optional parenthesis \)?

    Then make the whole part optional like this:

    ([A-Z]{2,}-\d+(?:\([^)]+\))?)\s([a-zA-Z]+\s?[a-zA-Z]+)\s\d*-\d*\s([a-zA-Z]+\s?[a-zA-Z]+\s?[a-zA-Z]+)
    

    See a regex demo

    Note that in this part \d*-\d* the digits are optional, and \s?` matches an optional whitespace character.


    To match 1 or more words, consisting of chars A-Za-z you can make use of optional repeating non capture groups (?:...)* inside a capture group.

    To prevent partial word matches, you could also make use of word boundaries \b

    \b([A-Z]{2,}-\d+(?:\([^)]+\))?)\s+([a-zA-Z]+(?:\s+[a-zA-Z]+)+)*\s+\d+-\d+\s+([a-zA-Z]+(?:\s+[a-zA-Z]+)*)\b
    

    See another regex demo