regexgrepregex-grouppcreregex-greedy

Regex to find string arguments of a function call (lookbehind with multiple hits)


I want to use grep (PCRE) to find all single-quoted strings that are passed to my function foo().

Example functions calls in my source code and the expected hits:

foo('Alice')                    -> Expected Hits: Alice
foo('Alice', 'Bob', 'Charlie')  -> Expected Hits: Alice, Bob, Charlie
foo(flag ? 'Alice' : 'Bob')     -> Expected Hits: Alice, Bob

My regex:

foo\([^\)]*\K(?:')([^'\)]*)(?:'\))

However, I get only the last single-quoted string for each function call and not all as you can see in my regex101 playground: https://regex101.com/r/FlzDYp/1

How can I define a PCRE conform regex for grep to get all expected hits?


Solution

  • You might use grep with -P for PCRE and -o to print only the matched parts.

    The pattern in parts matches:

    Example:

    grep -oP "(?:\bfoo\((?=[^()]*\))|\G(?!^))[^']*(?:'\h*[,:]\h*)?'\K\w+" file
    

    See a regex demo for the matches.


    An alternative using awk first matching the format foo(....) and then printing all the values between the single quotes that are alphanumeric or an underscore using a while loop.

    The \047 is a single quote here.

    awk '/foo\([^][]*\)/ {
      while(match($0, /\047[[:alnum:]_]+\047/)) {
        print substr($0, RSTART+1, RLENGTH-2)
        $0 = substr($0, RSTART + RLENGTH)
      }
    }' file