pythonpandasexcel

pd.read_excel Read only excel formulas not values


I'm trying to read only the formulas in a excel file with Pandas, I was trying this:

import pandas as pd
pd.read_excel(file, engine="openpyxl", engine_kwargs={"data_only":False})

but I get this error:

TypeError: openpyxl.reader.excel.load_workbook() got multiple values for keyword argument 'data_only'

I would like to know how to do it, and why the error


Solution

  • I think that's an issue you might need to check or open on GitHub.

    As a workaround (in 2.1.0), you can comment the line 573 in the file io\excel\_openpyxl.py.

    df1 = pd.read_excel("file.xslx", engine="openpyxl", engine_kwargs={"data_only": False})
    

    Or you can simply use :

    from openpyxl import load_workbook
    
    wb = load_workbook("file.xlsx", data_only=False)
    
    vals = wb["Sheet1"].values;cols = next(vals)
    df1 = pd.DataFrame(vals, columns=cols)
    

    If you want to mask the plain data, you can use :

    funcs_only = [
        [cell.value if cell.data_type == "f" else ""
         for cell in row] for row in wb["Sheet1"]
    ]
    
    df2 = pd.DataFrame(funcs_only[1:], columns=next(wb["Sheet1"].values))
    

    Outputs :

    print(df1)
    
             col1      col2
    0         foo  =TODAY()
    1         bar         A
    2  =A2&"/"&A3         B
    
    
    print(df2)
    
             col1      col2
    0              =TODAY()
    1                      
    2  =A2&"/"&A3          
    

    Used input (file.xlsx) :

    enter image description here