pandasdataframenumpypython-3.12

Python Pandas extract "range" object column values to 2 other colulmns with int type


I have a column with dtype = object. It has either NaN or 123 - 456. I need to extract min and max into it's own columns:

import numpy
df["min"] = df["values"].map(lambda x: minimum if x and (minimum:=str(x).split(" - ")[0]) else numpy.nan)
df["max"] = df["values"].map(lambda x: maximum if x and " - " in str(x) and (maximum:=str(x).split(" - ")[1]) else numpy.nan)

(1) Is this the most efficient way?

(2) This code snippet results in string column types. How to obtain int type?


Solution

  • Use Series.str.extract and convert output to Int64:

    df = pd.DataFrame({'values': [np.nan, '123 - 456', '4 - 9']})
    
    df[['min','max']]=df["values"].str.extract(r'(\d+)\s*-\s*(\d+)').astype('Int64')
    
    print (df)
          values   min   max
    0        NaN  <NA>  <NA>
    1  123 - 456   123   456
    2      4 - 9     4     9