regexperlregex-group

Merge two regexes with variable number of capture groups


I'm trying to match either

(\S+)(=)([fisuo])

or

(\S+)(!)

And then have the results placed in a list (capture groups). All of my attempts result in extra, unwanted captures.

Here's some code:

#!/usr/bin/perl
#-*- cperl -*-
# $Id: test7,v 1.1 2023/04/10 02:57:12 bennett Exp bennett $
#

use strict;
use warnings;
use Data::Dumper;

foreach my $k ('debugFlags=s', 'verbose!') {
    my @v;

    # Below is the offensive looking code.  I was hoping for a regex
    # which would behave like this:

    if(@v = $k =~ m/^(\S+)(=)([fisuo])$/) {
      printf STDERR ("clownMatch = '$k' => %s\n\n", Dumper(\@v));
    } elsif(@v = $k =~ m/^(\S+)(!)$/) {
      printf STDERR ("clownMatch = '$k' => %s\n\n", Dumper(\@v));
    }

    @v = ();

    # This is one of my failed, aspirational matches.  I think I know
    # WHY it fails, but I don't know how to fix it.
    
    if(@v = $k =~ m/^(?:(\S+)(=)([fisuo]))|(?:(\S+)(!))$/) {
      printf STDERR ("hopefulMatch = '$k' => %s\n\n", Dumper(\@v));
    }
    printf STDERR "===\n";
}

exit(0);
__END__

Output:

clownMatch = 'debugFlags=s' => $VAR1 = [
          'debugFlags',
          '=',
          's'
        ];


hopefulMatch = 'debugFlags=s' => $VAR1 = [
          'debugFlags',
          '=',
          's',
          undef,
          undef
        ];


===
clownMatch = 'verbose!' => $VAR1 = [
          'verbose',
          '!'
        ];


hopefulMatch = 'verbose!' => $VAR1 = [
          undef,
          undef,
          undef,
          'verbose',
          '!'
        ];


===

There are more details in the code comments. The output is at the bottom of the code section. And the '!' character is just that. I'm not confusing it with some other not.

Update Mon Apr 10 23:15:40 PDT 2023:

With the wise input of several readers, it seems that this question decomposes into a few smaller questions.

Can a regex return a variable number of capture groups?

I haven't heard one way or the other.

Should one use a regex in this way, if it could?

Not without a compelling reason.

For my purposes, should I use a regex to create what is really a lexical-analyzer/parser?

No. I was using a regex for syntax checking and got carried away.

I learned a good deal, though. I hope moderators see fit to keep this post as a cautionary tale.

Everyone deserves points on this one, and can claim that they were robbed, citing this paragraph. @Schwern gets the points for being first. Thanks.


Solution

  • Since you're matching two different things, it seems perfectly reasonable to have two different matches.

    But, if you do want to combine them, you can do this:

    m{^
      (\S+)
      (?:
        =([fisuo]) |
        (!)
      )
      $
    }x
    

    $1 is the name. $2 is the switch, if present. $3 is the !, if present.

    For anything more complicated, use named captures or Regexp::Assemble.

    Demonstration