I want to parse through my email inbox and find marketing emails with coupon codes in them extract the code from them the logic I have written works on only singular type of data.
def extract_promo_code(body):
# Use regular expressions to find promo code
promo_code_pattern = r'(?i)(?:Enter\s+Code|Enter\s+promo)(?:[\s\n]*)([A-Z0-9]+)'
match = re.search(promo_code_pattern, body)
if match:
promo_code = match.group(1)
# Remove any non-alphanumeric characters from the promo code
promo_code = re.sub(r'[^A-Z0-9]', '', promo_code)
return promo_code
else:
return None
Following are a couple of samples from which I want to extract coupon code:
"Enter code at checkout.* Offer valid until October 6, 2023, 11:59pm CT MKEA15EMYZGP8W"
"Enter code JSB20GR335F4 Ends September 21, 2023, at 11:59pm CT.*"
I want the code to catch the first promo code the comes after the text "Enter Code" or "enter promo" which consists a mix of digits and uppercase letters even if there are line breaks and spaces between text and promo code.
The above code runs fine for sample 2 but doesn't catch the code in sample 1.
You can use (you can adjust the pattern, I used that the promo-code has at minimum 10 characters) (regex101 demo):
import re
text = """\
Enter code at checkout.*
Offer valid until October 6, 2023, 11:59pm CT MKEA15EMYZGP8W
Enter code JSB20GR335F4 Ends September 21, 2023, at 11:59pm CT.*
"""
pat = r"""(?s)Enter (?:code|promo).*?\b([A-Z\d]{10,})"""
for code in re.findall(pat, text):
print(code)
Prints:
MKEA15EMYZGP8W
JSB20GR335F4