pythonpandaschained-assignment

I got around a SettingWithCopyWarning, feels like the wrong way and computationally inefficient, is there a better way?


I encountered the ever-common SettingWithCopyWarning when trying to change some values in a DataFrame. I found a way to get around this without having to disable the warning, but I feel like I've done it the wrong way, and that it is needlessly wasteful and computationally inefficient.

label_encoded_feature_data_to_be_standardised_X_train = X_train_label_encoded[['price', 'vintage']]
label_encoded_feature_data_to_be_standardised_X_test = X_test_label_encoded[['price', 'vintage']]
label_encoded_standard_scaler = StandardScaler()
label_encoded_standard_scaler.fit(label_encoded_feature_data_to_be_standardised_X_train)

X_train_label_encoded_standardised = label_encoded_standard_scaler.transform(label_encoded_feature_data_to_be_standardised_X_train)
X_test_label_encoded_standardised = label_encoded_standard_scaler.transform(label_encoded_feature_data_to_be_standardised_X_test)

That's how it's set up, then I get the warning if I do this:

X_train_label_encoded.loc[:,'price'] = X_train_label_encoded_standardised[:,0]

of if I do this:

X_train_label_encoded_standardised_df = pd.DataFrame(data=X_train_label_encoded_standardised, columns=['price', 'vintage'])

And I solved it by doing this:

X_train_label_encoded = X_train_label_encoded.drop('price', axis=1)
X_train_label_encoded['price'] = X_train_label_encoded_standardised_df.loc[:,'price']

This also works:

X_train_label_encoded.replace(to_replace=X_train_label_encoded['price'], value=X_train_label_encoded_standardised_df['price'])

But even that feels overly clunky with the extra DataFrame creation.

Why can't I just assign the column in some way? Or using some arrangement of the replace method? The documentation doesn't seem to have a solution, or am I just reading it wrong? Missing some obvious but not spelled out solution?

Is there a better way of doing this?


Solution

  • Many times, this warning is just a warning. If your code works and you aren't using chained assignment, you often have nothing to worry about.

    If your transformation maintains the index, including order, and your data is numeric, you can use pd.DataFrame.values:

    X_train_label_encoded['price'] = X_train_label_encoded_standardised.values[:, 0]
    

    This should sidestep the warning since X_train_label_encoded_standardised.values evaluates to a lower-level NumPy array.