I have some identifiers that will appear at the end of some file names and can vary in length. It will only be 8 or 12 characters long separated by some delimiter. It would be invalid if it were any other length.
I would like to keep the pattern as simple as possible but I don't think there's a mechanism (in standard regular expression syntax) to do multiple lengths without repeating myself.
This will not work for me since it allows lengths of 9-11 which are invalid:
-[A-Za-z0-9]{8,12}$
I could do this but I don't like that I have to repeat the character groups:
-(?:[A-Za-z0-9]{8}|[A-Za-z0-9]{12})$
It gets a little unruly when there are more lengths I need to support:
-(?:[A-Za-z0-9]{8}|[A-Za-z0-9]{12}|[A-Za-z0-9]{16}|[A-Za-z0-9]{20}|[A-Za-z0-9]{24}|[A-Za-z0-9]{28}|[A-Za-z0-9]{32})$
Are there any other more concise ways to do this or is this the best I can do?
I will accept anything that works for my case, but would be great if there was an option that would work for any arbitrary lengths.
My idea is similar to that of blhsing in that I would suggest checking for the length up front. However, I would suggest a positive definition of possible length. Just for illustration I use length 8,12,14 to not only have multiples of 4.
My regex attempt would be:
-(?=(?:.{8}|.{12}|.{14})$)[A-Za-z0-9]+$
See a demo on regex101. Input was taken from Hao Wus demo.
Explanation:
-
: Anchor pattern to literal -
.(?=(?: ... )$)
: Look ahead and check for different configurations of string length between -
and end of line.
.{8}|.{12}|.{14}
: In this case 8,12,14.[A-Za-z0-9]+$
: Finally assert your strings composition until end of line.The reason I bothered to add an additional answer is, that in a programming language like Python you would now be able to generate the pattern based on a list of possible length like so:
import re
strings=[
"some file name-ASDFghjk",
"some file name-ASDFghjk12",
"some file name-ASDFghjk1234",
"some file name-ASDFghjk123456",
"some file name-ASDFghjk12345678"
]
allowed_len=[8,12,14]
# Concatinate the possible lenght to ".{a}|.{b}|.....".
joined_len="|".join(".{"+str(n)+"}" for n in allowed_len)
# Use the concatination in the regex pattern to "outsource" this step.
# The ramaining pattern can easily be maintained here now.
pat=re.compile(rf"-(?=(?:{joined_len})$)[A-Za-z0-9]+$")
# Validate output.
[re.search(pat,s) for s in strings]