pythonregexstringsubstringstring-search

Find String Between Two Substrings in Python When There is A Space After First Substring


While there are several posts on StackOverflow that are similar to this, none of them involve a situation when the target string is one space after one of the substrings.

I have the following string (example_string): <insert_randomletters>[?] I want this string.Reduced<insert_randomletters>

I want to extract "I want this string." from the string above. The randomletters will always change, however the quote "I want this string." will always be between [?] (with a space after the last square bracket) and Reduced.

Right now, I can do the following to extract "I want this string".

target_quote_object = re.search('[?](.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text[2:])

This eliminates the ] and that always appear at the start of my extracted string, thus only printing "I want this string." However, this solution seems ugly, and I'd rather make re.search() return the current target string without any modification. How can I do this?


Solution

  • Your '[?](.*?)Reduced' pattern matches a literal ?, then captures any 0+ chars other than line break chars, as few as possible up to the first Reduced substring. That [?] is a character class formed with unescaped brackets, and the ? inside a character class is a literal ? char. That is why your Group 1 contains the ] and a space.

    To make your regex match [?] you need to escape [ and ? and they will be matched as literal chars. Besides, you need to add a space after ] to actually make sure it does not land into Group 1. A better idea is to use \s* (0 or more whitespaces) or \s+ (1 or more occurrences).

    Use

    re.search(r'\[\?]\s*(.*?)Reduced', example_string)
    

    See the regex demo.

    import re
    rx = r"\[\?]\s*(.*?)Reduced"
    s = "<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>"
    m = re.search(r'\[\?]\s*(.*?)Reduced', s)
    if m:
        print(m.group(1))
    # => I want this string.
    

    See the Python demo.