regexperl

Perl Regex fails to compile looking for \object


I am having difficulty creating a regular expression that is looking for a string similar to F:\work\object\src. I've created the following demonstration of things I've tried. Please note, that the $match is coming in from a database field, so that's why its defined as a string that is then made a RE by the qr operator.

#!/opt/perl/bin/perl
  use Try::Tiny;
  use Data::Dumper::Concise;

  my $str = 'F:\work\object\src';

  my @matches = (
      "\b F\\:\\work\\object\\src \b",
      '\b F\:\work\object\src \b',
      q{\b F\\:\\work\\object\\src \b},
      q{\b\QF:\work\object\src\E \b},
      q{\b F\\:\\work\\\\object\\src \b},
      qq{\b F\\:\\work\\\\object\\src \b},
  );

  my $i = 0;
  foreach my $match (@matches) {
      print "attempt ".$i++."\n";
      try {
          my $re  = qr{($match)}xims;
          print "Built successfully.\n";
          if ($str =~ /$re/) {
              print "Match\n";
          }
          else {
              print "But did not match!\n";
              print Dumper($re);
          }
      }
      catch {
          print "$match failed to build re\n";
          print "$_\n";
      };
  }

The output of this test program is as follows:

attempt 0
 F\:\work\object\src failed to build re
Missing braces on \o{} in regex; marked by <-- HERE in m/ F\:\work\o <-- HERE bject\sr)/ at ./reparse.pl line 20.

attempt 1
\b F\:\work\object\src \b failed to build re
Missing braces on \o{} in regex; marked by <-- HERE in m/(\b F\:\work\o <-- HERE bject\src \b)/ at ./reparse.pl line 20.

attempt 2
\b F\:\work\object\src \b failed to build re
Missing braces on \o{} in regex; marked by <-- HERE in m/(\b F\:\work\o <-- HERE bject\src \b)/ at ./reparse.pl line 20.

attempt 3
\b\QF:\work\object\src\E \b failed to build re
Missing braces on \o{} in regex; marked by <-- HERE in m/(\b\QF:\work\o <-- HERE bject\src\E \b)/ at ./reparse.pl line 20.

attempt 4
Built successfully.
But did not match!
qr/(\b F\:\work\\object\src \b)/msix
attempt 5
Built successfully.
But did not match!
qr/ F\:\work\\object\src)/msi

Attempts 4 and 5 seem to escape the \o but fail in matching the string. Would appreciate help crafting the string that will work.


Solution

  • If those matching strings aren't set in stone, I'd change it like this --

    Define your matching string(s) as single-quoted and without those escapes, like

    q(F:\work\object\src)
    

    and build patterns using

    qr{\Q$match\E}
    

    If this is for some reason unfeasible the \Q..\E (quotemeta) can be used directly in the regex. Add word boundaries or whatever else needed in your regex of course.

    A one-liner example (in Linux)

    perl -wE'$s=q(F:\o); $m=q(F:\o); $p=qr{\Q$m\E}; say $1 if $s =~ m{\b($p)\b}'
    

    Prints F:\o.

    And, just so, I'd also match them using a character other than / as a delimiter, like =~ m{$pattern} so that it's "portable" to paths that use /. For your string this isn't needed.

    If the matching string(s) need be in a database I'd consider it even more important to have the exact paths there, without escapes or any such. Then protect your patterns.


    Patterns with metacharacters, like \b, can't be used under the \Q..\E sequence as they get escaped and thus denied as a pattern. (quotemeta escapes all ASCII characters that aren't a word character, [a-zA-Z0-9_].)

    Either string up a pattern with it for later use, like

    my $patt = '\b' . qr{\Q$match\E} . '\b'
    

    what can then go under qr if needed ($patt = qr{$patt}), or add it directly in the regex

    ... =~ m{\b$match\b}
    

    [ Edit: Along with adding this footnote I added that word boundary to the one-liner example above ]