pythonpandasdataframecsv

Adding a column to multiple .csv files with the file name as you combine those .csv files into a single dataframe


I have 50 .csv files with over 188k rows combined that I would need to add the file name to so that I am able to track which file it came from. I have included the code I am using below which is able to combine the files into a single df.

df = pd.DataFrame()
for file in files:
    if file.endswith('.csv'):
        df=df.append(pd.read_csv(file), ignore_index=True)
df.head()

Solution

  • You're almost there. Instead of appending directly the result of the read_csv(), store it and add a new column with the file name

    for file in files:
        if file.endswith('.csv'):
            df_new = pd.read_csv(file)
            df_new['from_file'] = file
            df = df.append(df_new, ignore_index=True)
    

    Also if your file variable is actually the whole path to the file, you can use os.path.basename(file) which return the name of the file only, without the path.