I have a list input
:
['ICE ERIE', 'ERIE', 'o9 ManGo', 'ManGo SLACKCURRAN 120mL', 'SLACKCURRAN']
How can I extract the following string from it:
'ManGo SLACKCURRAN 120mL'
Another example:
Input
:
['SWANSON', 'Apple Cider Vinegar Food Supplement Supplement mg per tablet DOUBLE STRENGTH FORMULA per tablet 1 NET', 'Cider', 'Vinegar', 'Food Supplement DOUBLE', 'Supplement', '200', 'per', 'tablet', 'DOUBLE', 'TABLETS 1 NET WEIGHT: 62g', '1', 'NET', 'WEIGHT:']
Output
:
'TABLETS 1 NET WEIGHT: 62g'
My attempt:
import re
l = []
for each in input:
elif re.match('^\\d+\\.?\\d*(ounce|fl oz|foot|sq ft|pound|gram|inch|sq in|mL)$',each.lower()):
l.append(each)
else:
pass
You can use
import re
input_l = ['ICE ERIE', 'ERIE', 'o9 ManGo', 'ManGo SLACKCURRAN 120mL', 'SLACKCURRAN']
reg = re.compile(r'\d*\.?\d+\s*(?:ounce|fl oz|foot|sq ft|pound|gram|inch|sq in|ml)\b', re.I)
print( list(filter(reg.search, input_l)) )
# => ['ManGo SLACKCURRAN 120mL']
See the Python demo.
Notes:
re.search
to search for matches anywhere inside the string (re.match
only searches at the string start), see this thread^
(start of string) and $
(end of string) anchorsre.I
flag for case insensitive matching\d*\.?\d+
is a more convenient pattern to match either integer or float numbers as it also supports .95
like numbersr
prefix before the string literal).