pythonpandasjupyternumericcustom-data-type

Ounces to Pounds in Python


I have a column of weights (there are actually 5k weights), a small batch looks like this:

weight
15.00 oz
19.00 oz
2 lb 15.00 oz 
1 lb 19.00 oz

What I wanted to do is to convert the whole weight to pounds, something like this:

weight
0.9375 lb
1.1875 lb
2.9375 lb
2.1875 lb

What shall I do to achieve this? What I have tried so far is:

df[['lbs','oz']] = df.Weight.str.split("lb",expand=True)

but this doesn't work, as for rows with no 'lb' unit, the code doesn't work. that looked like this:

pounds    ounces
 15.00    oz
 19.00    oz
 2lb      15.00oz
 1lb      19.00oz

Solution

  • This works, but there's almost certainly a neater 'more pandas' way to do it... This should be fast enough to process 5,000 values.

    Imports:

    import pandas as pd
    

    Test data set-up (including the data with oz values after the .):

    df = pd.DataFrame(["15.00 oz",
                       "19.00 oz",
                       "2 lb 15.00 oz",
                       "1 lb 19.00 oz",
                       "1 lb 12.80 oz",
                       "1 lb",
                       "nothing"],
                       columns=["weight"])
    

    Produces:

              weight
    0       15.00 oz
    1       19.00 oz
    2  2 lb 15.00 oz
    3  1 lb 19.00 oz
    4  1 lb 12.80 oz
    5           1 lb
    6        nothing
    

    Define a function to map from the individual lb/oz values to a single lb value. This takes an array of tuples, which may be empty, such as: [(,'15.00')] or [] or [('1', '12.80')] (The 'numbers' in the matches are still of type str at this point):

    def lbsFromMatchedNumbers(matchData):
        if len(matchData) == 0:
            return None
        (lbs, oz) = matchData[0]
        lbs = float(lbs or 0)
        oz = float(oz or 0)
        ounces_in_pound = 16.0
        return lbs + (oz / ounces_in_pound)
    

    Find all the items in the 'weight' row, and then process them with the function, and assign to new 'lb' column:

    matchPattern = "^(?:(\d+) lb ?)?(?:(\d+(?:.\d+)?) oz)?$"
    df["lb"] = df["weight"].str.findall(matchPattern).apply(lbsFromMatchedNumbers)
    

    Produces:

              weight      lb
    0       15.00 oz  0.9375
    1       19.00 oz  1.1875
    2  2 lb 15.00 oz  2.9375
    3  1 lb 19.00 oz  2.1875
    4  1 lb 12.80 oz  1.8000
    5           1 lb  1.0000
    6        nothing     NaN
    

    Note: This works if there are only lb or oz numbers, as shown in extra rows in the sample data I've used. If there's neither, it produces NaN.


    Explanation of the regex part

    We're using a regex ('regular expression') to match the portions of the 'weight' text content using this pattern: "^(?:(\d+) lb ?)?(?:(\d+(?:.\d+)?) oz)?$"

    Regex syntax used

    Our particular example regex

    Putting that together, this regex basically says: