I have a dataframe with blank spaces as missing values, so I have replaced them with NaN values by using a regex. The problem that I have is when I want to use ordinal encoding for replacing categorical values. My code so far is the following:
x=pd.DataFrame(np.array([30,"lawyer","France",
25,"clerk","Italy",
22," ","Germany",
40,"salesman","EEUU",
34,"lawyer"," ",
50,"salesman","France"]
).reshape(6,3))
x.columns=["age","job","country"]
x = x.replace(r'^\s*$', np.nan, regex=True)
oe=preprocessing.OrdinalEncoder()
df.job=oe.fit_transform(df["job"].values.reshape(-1,1))
I got the following error:
Input contains NaN
I would like that the job column gets replaced with numbers such as: [1,2,-1,3,1,3].
You can try with factorize
, notice here is category start with 0
x.job.mask(x.job==' ').factorize()[0]
Out[210]: array([ 0, 1, -1, 2, 0, 2], dtype=int32)