I need to print all my matched strings from a stored line in perl. I have seen various posts on this Print the matched string using perl Perl Regex - Print the matched value
and I experimented to first try to print the first word. But I get a build error
Use of uninitialized value $1 in concatenation (.) or string at rg.pl line 10.
I have tried with split and arrays and it works, but while printing $1, it throws error.
My code is here
#!/usr/bin/perl/
use warnings;
use strict;
#my $line = "At a far distance near the bar, was a parked car. Star were shining in the night. The boy in the car had scar and he was at war with his enemy. \n";
my $line = "At a far distance near the bar, was a parked car. \n";
if($line =~ /[a-z]ar/gi)
{ print "$1 \n"; }
$_ = $line;
I want my output for this code to be
far
and subsequently print all the words containing ar,
far
near
bar
parked
car
I even tried changing my code, as below but that didnt work, same error
if($line =~ /[a-z]ar/gi) {
my $match = $1;
print "$match \n"; }
First, you didn't capture anything, which is how $n
variables are populated. Put parenthesis around what you want to be captured into $1
if ($line =~ /([a-z]ar)i/) { print "$1\n" }
I've removed the /g
which is unneeded (and with potential for trouble†) here.
Next, your pattern requires and captures one letter followed by literal ar
, no more no less. That won't capture near
, nor will it capture parked
(it'll get par
only). It will not even match a word that starts with ar
, since it requires that there is a letter before ar
. You need to use quantifiers, to tell it how many times to match a letter. And you also want to find all matches.
One way is to scoop them all up by providing the list context and /g
(global) modifier
my @words = $line =~ /([a-z]*ar[a-z]*)/gi;
print "$_\n" for @words;
The [a-z]*
means to match a letter, zero-or-more times. So an optional string of letters. We also added an optional string of letters after ar
. The /g
makes it continue through the string after a match, to find all such patterns. In the list context the list of matches is returned.
Or, you can match in scalar context like in the first example, but in a while
loop
while ($line =~ /([a-z]*ar[a-z]*)/gi) { print "$1\n" }
Here /g
does something different. It matches a pattern once and returns true, the while
condition is true and we print. Then it comes back and looks for a match from where it matched previously ... and keeps doing this until there are no more matches.
This is complex behavior altogether. From Regexp Quote-Like Operators in perlop
The
/g
modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.In scalar context, each execution of
m//g
finds the next match, returning true if it matches, and false if there is no further match. [...]
Read about this in more detail and in a tutorial manner in perlretut, under "Global matching."
† Note on using /g
modifier in scalar context
I've used that above, in while (/.../g)
, what is a very common way to hop over all occurrences of the pattern in a string, each time giving us control in the while
body.
While this use is intended and idiomatic, the use of /g
in scalar context can bring subtle trouble when not in the loop condition: the next regex with /g
on this variable will continue from the previous match, not from the string's beginning, what may be unexpected.
That "next regex" may also simply be that same expression -- in the next pass of some larger loop in which our expression happens to be, and this holds across function calls as well. Consider
use warnings;
use strict;
use feature 'say';
my $s = q(one two three);
sub func { say $1 if $_[0] =~ /(\w+)/g }; # /g may be of great consequence!
for (1..4) {
# ... perhaps much, much later ...
func($s);
}
This loop prints lines one
, then two
, then three
, and that's that. This (working) example is so bare bones that it is artificial bit I hope that it conveys that /g
in scalar context may surprise.
For one thing, it is not uncommon to see /g
on a regex in an if
condition being plain wrong.