scikit-learnimputets

I try imputing in sklearn but I have an error


I try below code but I have some error.

imp=SimpleImputer(missing_values='NaN',strategy="mean")
col = veriler.iloc[:,1:4].values
type(col) ##numpy.ndarray
imp=imp.fit(col)

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').


Solution

  • You need to convert the infinity values to a bounded value to apply imputation. np.nan_to_num clips nan, inf and -inf to workable values.

    For example:

    import numpy as np
    from sklearn.impute import SimpleImputer
    imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')
    X = [[7, np.inf, 3], [4, np.nan, 6], [10, 5, 9]]
    X = np.nan_to_num(X, nan=-9999, posinf=33333333, neginf=-33333333)
    imp_mean.fit(X)
    >>> SimpleImputer(add_indicator=False, copy=True, fill_value=None,
                  missing_values=nan, strategy='mean', verbose=0)
    

    For transform also, this can be applied:

    X = [[np.nan, 2, 3], [4, np.nan, 6], [10, np.nan, 9], [np.nan, np.inf, -np.inf]]
    X = np.nan_to_num(X, nan=-9999, posinf=33333333, neginf=-33333333)
    print(imp_mean.transform(X))
    
    >>>
    [[-9.9990000e+03  2.0000000e+00  3.0000000e+00]
     [ 4.0000000e+00 -9.9990000e+03  6.0000000e+00]
     [ 1.0000000e+01 -9.9990000e+03  9.0000000e+00]
     [-9.9990000e+03  3.3333333e+07 -3.3333333e+07]]