[SOLVED] Creating new dataframe from existing

Creating new dataframe from existing - SettingWithCopyWarning

I have a csv file that I import as a dataframe. This dataframe goes through multiple filtering steps. Data is also moved between columns based on conditionals.

import numpy as np
import pandas as pd

df = pd.read_csv('my_csv_file.csv', names=headers)
df2 = df.drop_duplicates(['Column_X'])
series1 = df2.loc[df2['Column_Y'] == 'Category1', 'Column_X']
df2.loc[df2['Column_Y'] == 'Category1', 'Column_Z'] = series1
...

After the last line is entered into the command prompt, I get the SettingWithCopyWarning.

SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer, col_indexer] = value instead.

Notice that I used .loc in my code.

Doing the following does not throw an error:

df.loc[df['Column_Y'] == 'Category1', 'Column_Z'] = series1

Which makes me think the problem is in using df2 as a new dataframe.

Solution

I believe the issue is that df2 is a view of df1. Instead put a .copy() at the end of the .drop_duplicates call.

df2 = df.drop_duplicates(['Column_X']).copy()