regex

Limit repetitions of character to multiple fixed lengths (and not ranges)


I have some identifiers that will appear at the end of some file names and can vary in length. It will only be 8 or 12 characters long separated by some delimiter. It would be invalid if it were any other length.

I would like to keep the pattern as simple as possible but I don't think there's a mechanism (in standard regular expression syntax) to do multiple lengths without repeating myself.

This will not work for me since it allows lengths of 9-11 which are invalid:

-[A-Za-z0-9]{8,12}$

I could do this but I don't like that I have to repeat the character groups:

-(?:[A-Za-z0-9]{8}|[A-Za-z0-9]{12})$

It gets a little unruly when there are more lengths I need to support:

-(?:[A-Za-z0-9]{8}|[A-Za-z0-9]{12}|[A-Za-z0-9]{16}|[A-Za-z0-9]{20}|[A-Za-z0-9]{24}|[A-Za-z0-9]{28}|[A-Za-z0-9]{32})$

Are there any other more concise ways to do this or is this the best I can do?

I will accept anything that works for my case, but would be great if there was an option that would work for any arbitrary lengths.


Solution

  • My idea is similar to that of blhsing in that I would suggest checking for the length up front. However, I would suggest a positive definition of possible length. Just for illustration I use length 8,12,14 to not only have multiples of 4.

    My regex attempt would be:

    -(?=(?:.{8}|.{12}|.{14})$)[A-Za-z0-9]+$
    

    See a demo on regex101. Input was taken from Hao Wus demo.

    Explanation:

    The reason I bothered to add an additional answer is, that in a programming language like Python you would now be able to generate the pattern based on a list of possible length like so:

    import re
    
    strings=[
        "some file name-ASDFghjk",
        "some file name-ASDFghjk12",
        "some file name-ASDFghjk1234",
        "some file name-ASDFghjk123456",
        "some file name-ASDFghjk12345678"
    ]
    
    allowed_len=[8,12,14]
    
    # Concatinate the possible lenght to ".{a}|.{b}|.....".
    joined_len="|".join(".{"+str(n)+"}"  for n in allowed_len)
    
    # Use the concatination in the regex pattern to "outsource" this step.
    # The ramaining pattern can easily be maintained here now.
    pat=re.compile(rf"-(?=(?:{joined_len})$)[A-Za-z0-9]+$")
    
    
    # Validate output.
    [re.search(pat,s) for s in strings]