I am trying to build a prediction model but currently keep getting an error: raise ValueError("Input contains NaN") ValueError: Input contains NaN
. I tried to use np.any(np.isnan(dataframe))
and np.any(np.isnan(dataframe))
, but I just keep getting new errors. For example, TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
.
Here is the code so far:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
import numpy as np
dataframe = pd.read_csv('file.csv', delimiter=',')
le = LabelEncoder()
dfle = dataframe
dfle2 = dfle.apply(lambda col: le.fit_transform(col.astype(str)), axis=0, result_type='expand')
newdf = dfle2[['column1', 'column2', 'column3', 'column4', 'column5', 'column6', 'column7']]
X = dataframe[['column1', 'column2', 'column4', 'column5', 'column6', 'column7']].values
y = dfle.column3
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ohe = OneHotEncoder()
ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough')
# np.all(np.isfinite(dfle))
# np.any(np.isnan(dfle))
X = ohe.fit_transform(X).toarray()
You can do multiple things to deal with this error
first, you can fill the Nan values by 0 dataframe = pd.read_csv('file.csv', delimiter=',').fillna(0)
or you can use sklearn
imputation techniques to fill the Nan value.
https://scikit-learn.org/stable/modules/classes.html#module-sklearn.impute
Multiple Imputation techniques are available but you should use KNNImputer
.