At work I came upon a problem that required parsing of filenames that did not have a consistent pattern. I tried for a significant amount of time to implement a pure regex solution but settled for some sting splits in code, with regex applied to the split parts. It worked out well, but I want to revisit this and learn from it, if there is a valid way to do this all in regex so I can learn.
I cannot sort out how to not match [\w\.\-]+
if there is a followingv?\d+\.\d+
following after. I've looked into lookahead / lookbehind and can't really understand how to apply either to this situation.
my "closest" attempts: the regex101 links contain more detail around matches and errors.
https://regex101.com/r/nz4Bxl/1
(?P<name>([\w\-]+)+\.?)(?P<version>(\.?\d+\.?)+\.?)(\.[a-z]{3,})/gm
With this pattern, the majority of strings are matched (although most versions are missing the leading digit and period), except for: appname.v1.2.3.exe (no name matched, version missing 1.) appname.v1.02.03.exe (no name matched, version missing 1.)
https://regex101.com/r/FyG7Mm/1/
(?P<name>([\w\-]+)]?)(?P<version>((\d+\.){1,}\d+\.?))\.[a-z]{3,}/gm
With this pattern, the majority of the input strings are matched except for:app-name.1.2.3.exe (no matches)appname.v1.2.3.exe (version matched, but no name match)appname.v1.02.03.exe (version matched, but no name match)appname71name.1.2.3.exe (no matches)
Neither of the above patterns are matching everything input text and desired results below:
input --> desired result
app-name.1.2.3.exe --> name: app-name version: 1.2.3
appname.v1.2.3.exe --> name: appname version: 1.2.3
appname.v1.02.03.exe --> name: appname version: 1.02.03
appname71name.1.2.3.exe --> name: appname71name version: 1.2.3
appname71.1.2.3.exe --> name: appname71 version: 1.2.3
appname-1.2.3.exe --> name: appname version: 1.2.3
appname-v1.2.3.exe --> name: appname version: 1.2.3
appname-v1.02.03.exe --> name: appname version: 1.02.03
appname71name-1.2.3.exe --> name: appname71name version: 1.2.3
appname71-1.2.3.exe --> name: appname71 version: 1.2.3
appname_1.2.3.exe --> name: appname version: 1.2.3
appname_v1.2.3.exe --> name: appname version: 1.2.3
appname_v1.02.03.exe --> name: appname version: 1.02.03
appname71name_1.2.3.exe --> name: appname71name version: 1.2.3
appname71_1.2.3.exe --> name: appname71 version: 1.2.3
You could start the match with repeating word characters without an underscore, and then repeat the match by -
or _
which is not directly followed by an optional v
1+ digits and a .
^(?P<name>[^\W_]+(?:[-_](?!v?\d+\.)[^\W_]+)*)[-_.]?v?(?P<version>\d+(?:\.\d+)*)\.[a-z]{3,}$
The pattern in parts matches:
^
Start of string(?P<name>
Named group name
[^\W_]+
Match 1+ word chars other than _
(?:
Non capture group to repeat as a whole part
[-_]
Match either -
or _
(?!v?\d+\.)
Negative lookahead, assert that from the current position there is not an optional v
followed by 1+ digits and .
[^\W_]+
Match 1+ word chars other than _
)*
Close the non capture group and optionally repeat it)
Close group name
[-_.]?v?
Match an optional -
_
.
followed by an optional v
char(?P<version>
Named group version
\d+(?:\.\d+)*
Match 1+ digits and optionally repeat matching .
and 1+ digits)
Close group version\.[a-z]{3,}
Match .
3 or more chars a-z$
End of stringSee a regex demo