Example: In the Netherlands, peanut butter is called "pindakaas" (peanut cheese) rather than "pindaboter" (peanut butter) because the word butter is only supposed to be used with products that contain actual butter.
I want to match everything between cheese
and butter
and viceversa.
Goals:
EDIT:
Language used is Python 3.7 and current reg-exp I'using is cheese(.*?)butter
.
If you install the regex
package from the PyPI
repository, then you can do overlapped
searches:
import regex as re
text = 'In the Netherlands, peanut butter is called "pindakaas" (peanut cheese) rather than "pindaboter" (peanut butter) because the word butter is only supposed to be used with products that contain actual butter.'
l = re.findall(r'\bbutter\b.*?\bcheese\b|\bcheese\b.*?\bbutter\b', text, overlapped=True)
print(l)
Prints:
['butter is called "pindakaas" (peanut cheese', 'cheese) rather than "pindaboter" (peanut butter']
I used your basic regex but required butter
and cheese
to be on word boundaries, e.g. \bbutter\b
, by placing \b
before and after the words. Feel free to remove or not.