python-3.xregex

Regex: Match all characters between two strings


Example: In the Netherlands, peanut butter is called "pindakaas" (peanut cheese) rather than "pindaboter" (peanut butter) because the word butter is only supposed to be used with products that contain actual butter.

I want to match everything between cheese and butter and viceversa.

Goals:

EDIT: Language used is Python 3.7 and current reg-exp I'using is cheese(.*?)butter.


Solution

  • If you install the regex package from the PyPI repository, then you can do overlapped searches:

    import regex as re
    
    text = 'In the Netherlands, peanut butter is called "pindakaas" (peanut cheese) rather than "pindaboter" (peanut butter) because the word butter is only supposed to be used with products that contain actual butter.'
    
    l = re.findall(r'\bbutter\b.*?\bcheese\b|\bcheese\b.*?\bbutter\b', text, overlapped=True)
    print(l)
    

    Prints:

    ['butter is called "pindakaas" (peanut cheese', 'cheese) rather than "pindaboter" (peanut butter']
    

    I used your basic regex but required butter and cheese to be on word boundaries, e.g. \bbutter\b, by placing \b before and after the words. Feel free to remove or not.