pythonexcelpandas

How can I force pandas to read the last Excel column when it is empty for the first few rows?


I have to parse data from Excel files (and I can't control the incoming format). Four columns have column headers; the fifth column does not, and is often blank -- sometimes there is no data at all in this column, and sometimes one or two rows may have data, so there can be an arbitrary number of blanks in that column near the top. If I do this:

pd.read_excel(excel_file, usecols="A:E")

I get a warning: FutureWarning: Defining usecols with out of bounds indices is deprecated and will raise a ParserError in a future version. and it doesn't give me column 5.

I have a workaround using openpyxl, but I would like to use pandas if possible. Is there a way to do this?


Solution

  • you can use dataframe.iloc the first value is the row range second value is the column range

    so in this example

    row will start at index 0 and end at the last entry

    Column will start at the second last column and end at the last column

    import pandas as pd
    
    data_frame = pd.read_excel("test.xlsx")
    
    print(data_frame.iloc[0:, -1:])