pythonregex

Regex to match a whole number not ending in some digits


I've not been able to construct a pattern which can return an whole numbers that don't end in a sequence of digits. The numbers could be of any length, even single digits, but they will always be whole. Additionally, multiple numbers could be on the same line of text and I want to match them all. The numbers are always followed by either a single space or the end of the line or the end of the text. I'm matching in python 3.12

For example, over the text '12345 67890 123175 9876', let's say I want to get all numbers not ending in 175.

I would want the following matches:

12345
67890
9876

I've tried using the following:

text = "12345 67890 123175 9876"
matches = findall(r"\d+(?<!175)(\b|$)", text)
print(matches)
> ['', '', '']
text = "12345 67890 123175 9876"
matches = findall(r"\d+(?!175)(\b|$)", text)
print(matches)
> ['', '', '', '']
matches = findall(r"\d+(?<!175)", text)
> ['12345', '67890', '12317', '9876']
matches = findall(r"\d+(?:175)", text)
> ['123175']

Solution

  • You can use is a negative lookbehind .*(?<!a) that ensures the string does not end with a.

    \d++(?<!175)
    

    Test here.

    Note that Possessive Quantifier (++) has been introduced in Python 3.11. Your 2nd approach from revision 1 was close, but not correct since the Greedy quantifier (+) would eat up all the digits, and then try to backtrack.