pythonstringfindall

re find date in string


Can someone explain how to use re.find all to separate only dates from the following strings? When the date can be either of the format- 1.1.2001 or 11.11.2001. There is volatile number of digits in the string representing days and months-

import re 
str = "This is my date: 1.1.2001 fooo bla bla bla"
str2 = "This is my date: 11.11.2001 bla bla foo bla"

I know i should use re.findall(pattern, string) but to be honest I am completely confused about those patterns. I don't know how to assemble the pattern to fit in my case.

I have found something like this but I absolutely don't know why there is the r letter before the pattern ... \ means start of string? d means digit? and number in {} means how many?

match = re.search(r'\d{2}.\d{2}.\d{4}', text)

Thanks a lot!


Solution

  • The r prefix to the strings tells the Python Interpreter it is a raw string, which essentially means backslashes \ are no longer treated as escape characters and are literal backslashes. For re module it's useful because backslashes are used a lot, so to avoid a lot of \\ (escaping the backslash) most would use a raw string instead.

    What you're looking for is this:

    match = re.search(r'\d{1,2}\.\d{1,2}\.\d{4}', text)
    

    The {} tells regex how many occurrences of the preceding set you wanted. {1,2} means a minimum of 1 and a maxmium of 2 \d, and {4} means an exact match of 4 occurrences.

    Note that the . is also escaped by \., since in regex . means any character, but in this case you are looking for the literal . so you escape it to tell regex to look for the literal character.

    See this for more explanation: https://regex101.com/r/v2QScR/1