regexstringcontext-free-grammar

Regex-like syntax or CFG for generating cartesian product of concatenated string variables and literals


I am writing a simulator, and would like to run studies by invoking a lot of instances of the simulator, using different sets of command-line arguments. I have read this question and several others, and they seem close, but I'm actually not looking for random data fulfilling a particular regex, I would like the set of all strings that match the regex. An example input file would look something like this:

myprogram.{version1|version2} -arg1 {1|2|4} {-arg2|}

or:

myprogram.{0} -arg1 {1} {2}
0: "version1" "version2"
1: "1" "2" "4"
2: "-arg2" ""

and would produce:

myprogram.version1 -arg1 1 -arg2
myprogram.version1 -arg1 1
myprogram.version1 -arg1 2 -arg2
myprogram.version1 -arg1 2
myprogram.version1 -arg1 4 -arg2
myprogram.version1 -arg1 4
myprogram.version2 -arg1 1 -arg2
myprogram.version2 -arg1 1
myprogram.version2 -arg1 2 -arg2
myprogram.version2 -arg1 2
myprogram.version2 -arg1 4 -arg2
myprogram.version2 -arg1 4

I would imagine something like this already exists, I just don't know the correct term to search for. Any help would be much appreciated. I can implement an abstract technique or algorithm myself if need be, but if it's a pre-existing tool I would prefer it to be free (at least as in beer) and run on Linux.

I know I am probably leaving some details out, and can be more specific about the appropriate things if necessary, rather than inundate people with a lot of detail up front. It is entirely possible that I am going about this the wrong way, and I am welcome to all solutions, even if they solve my problem in a different way.

Most importantly, this solution should not require me to write any extra parsing code if I want to add more argument options to the "cross-product" of strings I generate. I already have a Perl script that does this with a set of nested for loops over each "variable" that must change every time I change the number or nature of variables.


Solution

  • As long as the braces are not nested, regular expressions will work fine. If you require nesting, you could add some extra recursion in the implementation language.

    Here is an example in Python:

    import re
    
    def make_choices(template):
        pat = re.compile(r'(.*?)\{([^{}]+)\}',re.S)
    
        # tokenize the string
        last_end = 0
        choices = []
        for match in pat.finditer(template):
            prefix, alts = match.groups()
            if prefix:
                choices.append((prefix,)) # as a tuple
            choices.append(alts.split("|"))
            last_end = match.end()
    
        suffix = template[last_end:]
        if suffix:
            choices.append((suffix,))
    
        # recursive inner function
        def chooser(index):
            if index >= len(choices):
                yield []
            else:
                for alt in choices[index]:
                    for result in chooser(index+1):
                        result.insert(0,alt)
                        yield result
    
        for result in chooser(0):
            yield ''.join(result)
    

    Example:

    >>> for result in make_choices('myprogram.{version1|version2} -arg1 {1|2|4} {-arg2|}'):
    ...     print result
    ...
    myprogram.version1 -arg1 1 -arg2
    myprogram.version1 -arg1 1
    myprogram.version1 -arg1 2 -arg2
    myprogram.version1 -arg1 2
    myprogram.version1 -arg1 4 -arg2
    myprogram.version1 -arg1 4
    myprogram.version2 -arg1 1 -arg2
    myprogram.version2 -arg1 1
    myprogram.version2 -arg1 2 -arg2
    myprogram.version2 -arg1 2
    myprogram.version2 -arg1 4 -arg2
    myprogram.version2 -arg1 4
    

    You could use os.system() to execute the commands from within Python:

    #!/etc/env python
    import sys, os
    
    template = ' '.join(sys.args)
    failed = 0
    total = 0
    for command in make_choices(template):
        print command
        if os.system(command):
            print 'FAILED'
            failed += 1
        else:
            print 'OK'
        total += 1
    
    print
    print '%d of %d failed.' % (failed,total)
    
    sys.exit(failed > 0)
    

    And then on the command line:

    user:/home/> template.py 'program.{version1|version2}'
    program.version1
    OK
    program.version2
    FAILED
    
    1 of 2 failed.