pythonpandasregexextract

Regex exact match


I have the following sentence: "The size of the lunch box is around 1.5l or 1500ml"

How can I change this to: "The size of the lunch box is around 1.5 liter or 1500 milliliter"

In some cases, the value might also be present as "1.5 l or 1500 ml" with a space.

I am not be able to capture the "l" or "ml" when I am trying to build a function, or it is giving me an escape error.

I tried:

def stnd(text):

text = re.sub('^l%',' liter', text) 
text = re.sub('^ml%',' milliliter', text) 

text = re.sub('^\d+\.\d+\s*l$','^\d+\.\d+\s*liter$', text) 
text = re.sub('^^\d+\.\d+\s*ml$%','^\d+\.\d+\s*milliliter$', text) 

return text

Solution

  • You could use a dict to list all the units as the key, and use a pattern to find a digit followed by either ml or l which you could then use as the key for the dict to get the value.

    (?<=\d)m?l\b
    

    The pattern matches:

    See a regex demo.

    Example

    s = "The size of the lunch box is around 1.5l or 1500ml"
    pattern = r"(?<=\d)m?l\b"
    dct = {
        "ml": "milliliter",
        "l": "liter"
    }
    result = re.sub(pattern, lambda x: " " + dct[x.group()] if x.group() in dct else x, s)
    print(result)
    

    Output

    The size of the lunch box is around 1.5 liter or 1500 milliliter