pythonexcelpandas

pd.read_excel() ignores filters


I'm working with some xlsx files and need to import them into program. I've written a script that handles everything I need it to do.

However, I need to apply filters to the table in excel before importing them in.

When I apply filters and prep the table to import into python, python reads the entire table ignoring all the filters.

My work around has been filtering what I need then copying that to a new sheet. Then when reading into excel I specify the new sheet of filtered data that I'm looking for.

Is there a way to read the filtered table into excel directly?

Or Should I just import the entire table and apply those same filters using pandas in my script instead?


Solution

  • IIUC, you can't read only visible rows and/or columns of an Excel spreadsheet with .

    To do that, you need some help from (!pip install openpyxl) :

    from openpyxl import load_workbook
    import pandas as pd
    
    wb = load_workbook("file.xlsx")
    ws = wb.active # or wb["SheetName"] # <- change the name here
    
    rows = [[c.value for c in r] for r in ws.iter_rows()
             if not ws.row_dimensions[r[0].row].hidden]
    
    df = pd.DataFrame(data= rows[1:], columns=rows[0])
    

    Output :

    print(df)
    
       col  col2
    0  foo   1.0
    2  baz   3.0
    

    Input used (spreadsheet) :

    enter image description here