pythonexcelpandasdataframeimport

reading excel to a python data frame starting from row 5 and including headers


I have an excel workbook that runs some vba on opening which refreshes a pivot table and does some other stuff.

Then I wish to import the results of the pivot table refresh into a dataframe in python for further analysis.

import xlrd

wb = xlrd.open_workbook('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm')

The refreshing and opening of the file works fine. But how do I select the data from the first sheet from say row 5 including header down to last record n.


Solution

  • You can use pandas' ExcelFile parse method to read Excel sheets, see io docs:

    xls = pd.ExcelFile('C:\Users\cb\Machine_Learning\cMap_Joins.xlsm')
    
    df = xls.parse('Sheet1', skiprows=4, index_col=None, na_values=['NA'])
    

    skiprows will ignore the first 4 rows (i.e. start at row index 4), and several other options.