I have a list of strings and I want to split each string on a floating point number. If there is no floating point number in the string, I want to split it on a number. It should only split once and return everything before and after it separated by commas.
Input string:
['Naproxen 500 Active ingredient Ph Eur',
'Croscarmellose sodium 22.0 mg Disintegrant Ph Eur',
'Povidone K90 11.0 Binder 56 Ph Eur',
'Water, purifieda',
'Silica, colloidal anhydrous 2.62 Glidant Ph Eur',
'Magnesium stearate 1.38 Lubricant Ph Eur']
Expected output:
['Naproxen', '500', 'Active ingredient Ph Eur',
'Croscarmellose sodium', '22.0 mg', 'Disintegrant Ph Eur',
'Povidone K90', '11.0', 'Binder Ph Eur',
'Water, purified',
'Silica, colloidal anhydrous', '2.62', 'Glidant Ph Eur',
'Magnesium stearate', '1.38', 'Lubricant Ph Eur']
Try this re.split
option:
inp = 'Croscarmellose sodium 22.0 mg Disintegrant Ph Eur'
parts = re.split(r'\s+(\d+(?:\.\d+)?)\s+', inp, 1)
print(parts)
This prints:
['Croscarmellose sodium', '22.0', 'mg Disintegrant Ph Eur']
The idea is to split on this regex pattern:
\s+(\d+(?:\.\d+)?)\s+
This matches a number, with optional decimal component, surrounded by whitespace. Note that we place parentheses around the number, since we do not want to consume it in the split. Also note carefully that re.split
is being used with its third parameter set to 1, which tells Python to split only once.