pythonregexparentheses

Regex for matching data between parenthesis BUT with a pattern to match inside


I saw many examples of how to get data between parenthesis with regex for python but none with some pattern inside.

For example, I have this data:

Overall (each): 37 1/4 × 74 1/2 × 7 7/8 in. (94.6 × 189.2 × 20 dm)
Each, 30 x 50 in. (76.2 x 127 dm.)
24 3/8 x 14 5/8 x 5 1/8 in. (61.9 x 37.1 x 13 dm)

What I am tryng to achieve at least is:

(94.6 × 189.2 × 20 dm)
(76.2 x 127 dm.)
(61.9 x 37.1 x 13 dm)

And the perfect result would be what is below but I am sure this will require a second split:

94.6, 189.2, 20 
76.2, 127
61.9, 37.1, 13

Currently, I am trying this code: regex, but as you can see without the success in capturing just the cm parenthesis data.


Solution

  • Use

    \(([^()]*\bcm\b[^()]*)\)
    

    See proof

    Explanation

    --------------------------------------------------------------------------------
      \(                       '('
    --------------------------------------------------------------------------------
      (                        group and capture to \1:
    --------------------------------------------------------------------------------
        [^()]*                   any character except: '(', ')' (0 or
                                 more times (matching the most amount
                                 possible))
    --------------------------------------------------------------------------------
        \b                       the boundary between a word char (\w)
                                 and something that is not a word char
    --------------------------------------------------------------------------------
        cm                       'cm'
    --------------------------------------------------------------------------------
        \b                       the boundary between a word char (\w)
                                 and something that is not a word char
    --------------------------------------------------------------------------------
        [^()]*                   any character except: '(', ')' (0 or
                                 more times (matching the most amount
                                 possible))
    --------------------------------------------------------------------------------
      )                        end of \1
    --------------------------------------------------------------------------------
      \)                       ')'