pythonpandasxlsx

Is there a way of writing a formula to the .xlsx file just with pandas, i.e. without using tools like xlsxwriter / openpyxl?


I wrote a script that reads this .xlsx file, then creates a column that is the sum of the three other columns, and saves everything into a new file:

import pandas 

df = pandas.read_excel("excel-comp-data.xlsx")

df["total"] = df["Jan"] + df["Feb"] + df["Mar"]

df.to_excel("excel-comp-data-formula-by-pandas.xlsx")

The issue with this code is that it does not create a formula, it just adds everything up and places the result in the newly created column.

When I later access the newly created file in libreoffice calc and manually modify any data in "Jan", "Feb", or "March" the corresponding data in the column "total" does not get updated.

I've have found some code snippets on SO that create formulas, but all of them use tools as xlsxwriter. How might I create a formula in pandas without using such tools?

Is it at all doable?


Solution

  • How might I create a formula in pandas without using such tools?

    Pandas uses xlsxwriter (or openpyxl) to create xlsx file so you are already using them.

    You can add formulas, instead of a static sum, like this:

    import pandas 
    
    df = pandas.read_excel("excel-comp-data.xlsx")
    
    df["total"] = [f'=SUM(H{row}:J{row})' for row in range(2, df.shape[0] + 2)]
    
    df.to_excel("excel-comp-data-formula-by-pandas.xlsx", engine='xlsxwriter')
    

    Output:

    enter image description here

    Excel gives a formula warning (green triangle) since it thinks, incorrectly, that you should be adding column G to the formula as well. Hopefully you can just ignore that, or switch columns F and G in the dataframe.

    It also works with openpyxl as an engine as well.