I want to use grep (PCRE) to find all single-quoted strings that are passed to my function foo()
.
Example functions calls in my source code and the expected hits:
foo('Alice') -> Expected Hits: Alice
foo('Alice', 'Bob', 'Charlie') -> Expected Hits: Alice, Bob, Charlie
foo(flag ? 'Alice' : 'Bob') -> Expected Hits: Alice, Bob
My regex:
foo\([^\)]*\K(?:')([^'\)]*)(?:'\))
However, I get only the last single-quoted string for each function call and not all as you can see in my regex101 playground: https://regex101.com/r/FlzDYp/1
How can I define a PCRE conform regex for grep to get all expected hits?
You might use grep with -P
for PCRE and -o
to print only the matched parts.
The pattern in parts matches:
(?:
Non capture group
\bfoo\(
Match the word foo
followed by (
(?=[^()]*\))
Positive lookahead to assert a closing )
to the right|
Or\G(?!^)
Assert the current position at the end of the previous match, but not at the start of the string (as \G
can match at those 2 positions))
Close the non capture group[^']*
match optional chars other than '
(?:'\h*[,:]\h*)?
Optionally match either ,
or :
between optional spaces'
Match the '
\K
Forget what is matched so far as we don't want that '
in the result\w+
Match 1 or more word charactersExample:
grep -oP "(?:\bfoo\((?=[^()]*\))|\G(?!^))[^']*(?:'\h*[,:]\h*)?'\K\w+" file
See a regex demo for the matches.
An alternative using awk
first matching the format foo(....)
and then printing all the values between the single quotes that are alphanumeric or an underscore using a while loop.
The \047
is a single quote here.
awk '/foo\([^][]*\)/ {
while(match($0, /\047[[:alnum:]_]+\047/)) {
print substr($0, RSTART+1, RLENGTH-2)
$0 = substr($0, RSTART + RLENGTH)
}
}' file