I have a column with dtype
= object
. It has either NaN
or 123 - 456
. I need to extract min and max into it's own columns:
import numpy
df["min"] = df["values"].map(lambda x: minimum if x and (minimum:=str(x).split(" - ")[0]) else numpy.nan)
df["max"] = df["values"].map(lambda x: maximum if x and " - " in str(x) and (maximum:=str(x).split(" - ")[1]) else numpy.nan)
(1) Is this the most efficient way?
(2) This code snippet results in string
column types. How to obtain int
type?
Use Series.str.extract
and convert output to Int64
:
df = pd.DataFrame({'values': [np.nan, '123 - 456', '4 - 9']})
df[['min','max']]=df["values"].str.extract(r'(\d+)\s*-\s*(\d+)').astype('Int64')
print (df)
values min max
0 NaN <NA> <NA>
1 123 - 456 123 456
2 4 - 9 4 9