pythonpandasdataframexls

How do I merge multiple xls files into one dataframe in python?


I have excel files which each have 3 sheets and all same column headers.

I want to merge all so that I get one combined dataframe. This dataframe should have the sheetname,row number and xls filename listed next to the final dataframe.

i tried

 import pandas as pd
 dfe = pd.concat(pd.read_excel('abc1.xls', sheet_name=None), ignore_index=True)

this combined all sheets for one of the files but did not write the row numbers or sheetnames next to the data. How can I do the same?

Desired output-

Rownumber Sheetname Filename         State Region  Brand   
 1           tig       abc1.xls      CA         S   Go     
 2           con       abc2.xls      IA         A   Po

Solution

  • Here's how to do it. I used two functions. The first function reads all sheets within a single Excel file, and adds the sheet name. The second function takes all of the excel files, and uses the first function to read all the sheets in all the files.

    from pandas import pd
    def read_sheets(filename):
        result = []
        sheets = pd.read_excel(filename, sheet_name=None)
        for name, sheet in sheets.items():
            sheet['Sheetname'] = name
            sheet['Row'] = sheet.index
            result.append(sheet)
        return pd.concat(result, ignore_index=True)
    
    def read_files(filenames):
        result = []
        for filename in filenames:
            file = read_sheets(filename)
            file['Filename'] = filename
            result.append(file)
        return pd.concat(result, ignore_index=True)
    

    You can call this by providing a list of files to read:

    files = ['multisheet.xls', 'multisheet2.xls']
    read_files(files)
    

    For the example I tried it on, it produces a dataframe like this:

        A   B  A+B Sheetname  Row         Filename
    0   1  10   11    Sheet1    0   multisheet.xls
    1   2  11   13    Sheet1    1   multisheet.xls
    2   3  12   15    Sheet1    2   multisheet.xls
    3   4  13   17    Sheet1    3   multisheet.xls
    4   3  10   13    Sheet2    0   multisheet.xls
    5   3  11   14    Sheet2    1   multisheet.xls
    6   3  12   15    Sheet2    2   multisheet.xls
    7   3  13   16    Sheet2    3   multisheet.xls
    8   1  10   11    Sheet1    0  multisheet2.xls
    9   2  11   13    Sheet1    1  multisheet2.xls
    10  3  12   15    Sheet1    2  multisheet2.xls
    11  4  13   17    Sheet1    3  multisheet2.xls
    12  4  10   13    Sheet2    0  multisheet2.xls
    13  3  11   14    Sheet2    1  multisheet2.xls
    14  3  12   15    Sheet2    2  multisheet2.xls
    15  3  13   16    Sheet2    3  multisheet2.xls