javascriptregexrecursioncapture-group

Is it possible to recursively capture a FINITE number of matches (all of the same format) using ECMA RegEx?


For instance, say we're looking at a query string, all-lowercase, all non-numeric, no special character (just [a-z] and =):

?some=querystring&ssembly=containing&n=indeterminate&mount=of&ll=potentially&ccordant=matches

Let us take as a given we know there will be three key-value pairs we wish to capture, and even that they are located at the beginning of said string:

Now, intuitively, it seems like I should be able to use something like...

^\?(&?[a-z=]+){3}.*$

...or possibly...

^\?(?:&?([a-z=]+)){3}.*$

...but, of course, the only capture this yields is

n=indeterminate

Is there a syntax that would allow me to capture all three groups (as independent, accessible values, natch) without having to resort to the following?

^\?([a-z=]+)&([a-z=]+)&([a-z=]+).*$

I know there's no way to capture n instances (an arbitrarily-large set), but, given this is a finite number of captures I wish to obtain from my finite automata...

I know full well there are any number of ways to accomplish this in Javascript, or any other language for that matter. I'm specifically trying to ascertain if I'm stuck with the WET expression above.


Solution

  • It would take a complex explanation to describe the nuance how the FIRST continuous
    specific number of non-breaking segments are done in the various flavors of Regex Engines.

    This JavaScript regex below does that task and is really the only way to do it in JS.
    Note that this regex will fail for any number of continuous segments less than 3.
    You can test it here https://regex101.com/r/3oWjwz/1

    Other engines have different tools to work with to accomplish this task.
    For example the Dot Net engine is by far the most comprehensive tool bed for doing these things (Capture Colections) ^?(?:&?([a-z=]+)(?![a-z=])){3}.*$

    var input = '?some=querystring&ssembly=containing&n=indeterminate&mount=of&ll=potentially&ccordant=matches'
    
    var regex = RegExp("(?<=^\\?(?=(?:&?[a-z=]+(?![a-z=])){3})(?:&?[a-z=]+){0,2}&?)[a-z=]+", 'g');
    
    console.log( input.match(regex) );

    A List of how this is applied to different Quantified Forms.

    Quantifier Unlimited +
    https://regex101.com/r/bOGyJy/1

    # Quantifier Unlimited +  
    # (?<=^\?(?:&?[a-z=]+(?![a-z=]))*&?)[a-z=]+
    # https://regex101.com/r/bOGyJy/1
    
    (?<=
       ^ \? 
       (?:
          &? [a-z=]+ 
          (?! [a-z=] )
       )*
       &?
    )
    [a-z=]+ 
    

    Quantifier Exact {3}
    https://regex101.com/r/3oWjwz/1

    # Quantifier Exact {3}  
    # (?<=^\?(?=(?:&?[a-z=]+(?![a-z=])){3})(?:&?[a-z=]+){0,2}&?)[a-z=]+
    # https://regex101.com/r/3oWjwz/1
    
    (?<=
       ^ \? 
       (?=
          (?:
             &? [a-z=]+ 
             (?! [a-z=] )
          ){3}                     # Exact range 3
       )
       (?: &? [a-z=]+ ){0,2}       # Zero to one less tham max range
       &?
    )
    [a-z=]+ 
    

    Quantifier Range {2,4}
    https://regex101.com/r/D1NrLQ/1

    # Quantifier Range {2,4}  
    # (?<=^\?(?=(?:&?[a-z=]+(?![a-z=])){2,4})(?:&?[a-z=]+){0,3}&?)[a-z=]+
    # https://regex101.com/r/D1NrLQ/1
    
    (?<=
       ^ \? 
       (?=
          (?:
             &? [a-z=]+ 
             (?! [a-z=] )
          ){2}                     # 2 the minimum range
       )
       (?: &? [a-z=]+ ){0,3}       # Zero to one less than max range
       &?
    )
    [a-z=]+