pythonpandasdataframepandas-settingwithcopy-warning

How to fix SettingWithCopyWarning? (Python)


I have a dataframe with some columns of the same type:

['total_tracks', 't_dur0', 't_dur1', 't_dur2', 't_dance0', 't_dance1', 't_dance2', 
 't_energy0', 't_energy1', 't_energy2', 't_key0', 't_key1', 't_key2', 't_mode0', 
 't_mode1', 't_mode2', 't_speech0', 't_speech1', 't_speech2', 't_acous0', 't_acous1', 
 't_acous2', 't_ins0', 't_ins1', 't_ins2', 't_live0', 't_live1', 't_live2', 't_val0', 
 't_val1', 't_val2', 't_tempo0', 't_tempo1', 't_tempo2', 't_sig0', 't_sig1', 't_sig2', 
 'popularity', 'release_year', 'release_month']

And I am trying to combine the columns with the same type like this:

# Takes in a dataframe with three columns and returns a dataframe with one column of their means
def average_column(dataframe):
    dataframe["mean"] = dataframe.mean(axis=1)                        # Add column to the dataframe (axis=1 means the mean() is applied row-wise)
    mean_df = dataframe.iloc[: , -1:]                                 # Isolated column of the mean by selecting all rows (:) for the last column (-1:)
    print("Original: {}\tWith mean:\n{}".format(dataframe, mean_df))
    return mean_df

Inspired by this and this question. I tried to run this code:

t_name_df = df[["t_dur0", "t_dur1", "t_dur2"]]
print(t_name_df.columns.tolist())
average_column(t_name_df)

Which gave me this output:

['t_dur0', 't_dur1', 't_dur2']
Original:
      t_dur0  t_dur1  t_dur2         mean
0       2315    2310    2293  2306.000000
1       1558     886    1870  1438.000000
2        803     316     504   541.000000
3        498     815     677   663.333333
4       1508    1677    1386  1523.666667
...      ...     ...     ...          ...
[2833 rows x 4 columns]
With mean:
         mean
0     2306.000000
1     1438.000000
2      541.000000
3      663.333333
4     1523.666667
...           ...

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

To get rid of the warning I tried re-writing it:

t_name_df = df.loc['t_dur0', 't_dur0']
print(t_name_df.column.tolist())
average_column(t_name_df)

Which gave me this error:

KeyError: 't_dur0'

How do I get rid of this warning correctly?


Solution

  • Change your average_column function to this:

    def average_column(dataframe):
        # ADD THIS LINE:
        dataframe = dataframe.copy()
        
        dataframe["mean"] = dataframe.mean(axis=1)                        # Add column to the dataframe (axis=1 means the mean() is applied row-wise)
        mean_df = dataframe.iloc[: , -1:]                                 # Isolated column of the mean by selecting all rows (:) for the last column (-1:)
        print("Original: {}\tWith mean:\n{}".format(dataframe, mean_df))
        return mean_df
    

    The warning is happening because by doing t_name_df = df[["t_dur0", "t_dur1", "t_dur2"]], you're creating a copy of those columns, and pandas is telling you that changes you make to it (t_name_df) won't reflect in the original dataframe (df). By adding .copy(), you explicitly let pandas know that you're okay with that happening.