pythonmachine-learningdaskdask-ml

Impute mean of single column in dask-ml


Calculating and imputing the mean using dask-ml works fine when changing all the columns that are np.nan:

imputer = impute.SimpleImputer(strategy='mean')
data = [[100, 2], [np.nan, np.nan], [70, 7]]
df = pd.DataFrame(data, columns = ['Weight', 'Age']) 
x3 = imputer.fit_transform(df)
print(x3)

    Weight  Age
 0  100.0   2.0
 1  85.0    4.5
 2  70.0    7.0

But what if I need to leave Age untouched? Is it possible to specify what columns to impute?


Solution

  • You should be able to specify colums by df.Weight = imputer.fit_transform(df.Weight) or by indexing columns df.loc["Weight"]