I have 50 .csv files with over 188k rows combined that I would need to add the file name to so that I am able to track which file it came from. I have included the code I am using below which is able to combine the files into a single df.
df = pd.DataFrame()
for file in files:
if file.endswith('.csv'):
df=df.append(pd.read_csv(file), ignore_index=True)
df.head()
You're almost there. Instead of appending directly the result of the read_csv()
, store it and add a new column with the file name
for file in files:
if file.endswith('.csv'):
df_new = pd.read_csv(file)
df_new['from_file'] = file
df = df.append(df_new, ignore_index=True)
Also if your file
variable is actually the whole path to the file, you can use os.path.basename(file)
which return the name of the file only, without the path.