regexperlperl-critic

Is there a better way to write Perl regexes with /x so the code is still easy to read?


I ran Perl::Critic on one of my scripts, and got this message:

Regular expression without "/x" flag at line 21, column 26. See page 236 of PBP.

I looked up the policy information here, and I understand that writing regular expressions in extended mode will help anyone who is looking at the code.

However, I am stuck as how to convert my code to use the /x flag.

CPAN Example:

# Match a single-quoted string efficiently...

m{'[^\\']*(?:\\.[^\\']*)*'};  #Huh?

# Same thing with extended format...

m{
    '           # an opening single quote
    [^\\']      # any non-special chars (i.e. not backslash or single quote)
    (?:         # then all of...
        \\ .    #    any explicitly backslashed char
        [^\\']* #    followed by an non-special chars
    )*          # ...repeated zero or more times
    '           # a closing single quote
}x;

This makes sense if you only look at the regex.

My Code:

if ($line =~ /^\s*package\s+(\S+);/ ) {

I am not exactly sure how to use an extended regex inside of an if statement. I can write it like this:

    if (
        $line =~ /
        ^\s*    # starting with zero or more spaces
        package
        \s+     # at least one space
        (\S+)   # capture any non-space characters
        ;       # ending in a semi-colon
        /x
      )
    {

And this works, but I think this is almost harder to read than the original. Is there a better way (or a best practice way) to write this? I guess I could create a variable using qr//.

I'm not really looking for advice on re-writing this specific regex (although if I can improve it, I'll take advice) - I'm more looking for advice on how to expand a regex inside of an if statement.

I know Perl::Critic is just a guideline, but it would be nice to follow it.

Thanks in advance!

EDIT: So after receiving a few answers, it became clear to me that making a regex multi-line with comments is not always necessary. People who understand basic regex should be able to understand what my example was doing - the comments I added were maybe a little unnecessary and verbose. I like the idea of using the extended regex flag, but still embedding spaces in the regex to make each part of the regex a little more clear. Thanks for all the input!


Solution

  • Well, I really don't think you should waste vertical screen real estate on this. On the other hand, if I were to write this pattern over several lines, I would have used braces and indented the pattern:

    if ($line =~ m{
            \A \s*
            package
            \s+
            (\S+)
            \s* ;
        }x 
    ) {
    

    IMHO, the following version is perfectly fine:

    if ( $line =~ m{ \A \s* package \s+ (\S+) \s* ; }x  ) {
    

    in terms of getting the benefit of m//x.

    The comments are completely unnecessary in this case because you are not doing anything tricky. I did add \s* before the semi-colon because sometimes people set the semi-colon apart from the package name and that should not throw off your match.