I'm trying to read only the formulas in a excel file with Pandas, I was trying this:
import pandas as pd
pd.read_excel(file, engine="openpyxl", engine_kwargs={"data_only":False})
but I get this error:
TypeError: openpyxl.reader.excel.load_workbook() got multiple values for keyword argument 'data_only'
I would like to know how to do it, and why the error
I think that's an issue you might need to check or open on GitHub.
As a workaround (in 2.1.0
), you can comment the line 573 in the file io\excel\_openpyxl.py
.
df1 = pd.read_excel("file.xslx", engine="openpyxl", engine_kwargs={"data_only": False})
Or you can simply use openpyxl :
from openpyxl import load_workbook
wb = load_workbook("file.xlsx", data_only=False)
vals = wb["Sheet1"].values;cols = next(vals)
df1 = pd.DataFrame(vals, columns=cols)
If you want to mask the plain data, you can use :
funcs_only = [
[cell.value if cell.data_type == "f" else ""
for cell in row] for row in wb["Sheet1"]
]
df2 = pd.DataFrame(funcs_only[1:], columns=next(wb["Sheet1"].values))
Outputs :
print(df1)
col1 col2
0 foo =TODAY()
1 bar A
2 =A2&"/"&A3 B
print(df2)
col1 col2
0 =TODAY()
1
2 =A2&"/"&A3
Used input (file.xlsx) :