pythonregexstringreplaceregexp-replace

Substitute substring with multiple characters


I am trying to make a string replacement using regex, but I am not able to achieve the expected result.

I have a list of strings ['INSUTRIN 150G','NEUROZINE 30 ML','INSULITROL 30 CAPS'] for example. I would like to transform this into:

['INSUTRIN','NEUROZINE','INSULITROL']

I tried the following approach using regex:

import re
def modify_string(x):
        return re.sub("\d{1,}\s{0,}[[ML]|G|KG|CAPS]", "", x).strip()

and get the results:

['INSUTRIN','NEUROZINE L','INSULITROL APS']

How could I set regex properly for that?


Solution

  • If all you want is to remove the number and unit part in the end, then you can simply replace \s\d+.* for each string:

    import re
    
    def modify_string(x):
        return re.sub(r"\s\d+.*", "", x).strip()
    
    things = ['INSUTRIN 150G', 'NEUROZINE 30 ML', 'INSULITROL 30 CAPS']
    things = list(map(modify_string, things))
    print(things)
    

    Output:

    ['INSUTRIN', 'NEUROZINE', 'INSULITROL']
    

    Additionaly:

    def modify_string(x):
        return x.split()[0]
    
    def modify_string(x):
        return re.sub(r"\s\d+.*(ML|G|KG|CAPS)", "", x).strip()