I created a set of random missing values to practice with a tree imputer. However, I'm stuck on how to overwrite the missing values into the my dataframe. My missing values look like this:
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
df_post_copy = df_post.copy()
missing_mask = df_post_copy.isna()
imputer = IterativeImputer(max_iter=10, random_state=0)
imputed_values = imputer.fit_transform(df_post_copy)
df_copy[missing_mask] = imputed_values[missing_mask]
Results in:
ValueError: other must be the same shape as self when an ndarray
But the shape matches...
imputed_values.shape
(16494, 29)
The type is:
type(imputed_values)
numpy.ndarray
What I have tried since it is the right shape is to convert it to a pandas dataframe:
test_imputed_values = pd.DataFrame(imputed_values)
When I try:
df_copy[missing_mask] = test_imputed_values[missing_mask]
I get the same as above:
How do I use a mask to insert the imputed values where needed?
imputer.fit_transform(...)
returns both the original values and the (previously) missing values. If you want an updated DataFrame, something like
imputed_values = imputer.fit_transform(df_post_copy)
df_post_copy.loc[:, :] = imputed_values
should work.