pythonpandasopenpyxlsetter

Property 'sheets' of 'OpenpyxlWriter' object has no setter using pandas and openpyxl


This code used to get a xlsx file and write over it, but after updating from pandas 1.1.5 to 1.5.1 I got zipfile.badzipfile file is not a zip file

Then I read here that after pandas 1.2.0 the pd.ExcelWriter(report_path, engine='openpyxl') creates a new file but as this is a completely empty file, openpyxl cannot load it.

Knowing that, I changed the code to this one, but now I'm getting AttributeError: property 'sheets' of 'OpenpyxlWriter' object has no setter. How should I handle this?

book = load_workbook('Resultados.xlsx')
    writer = pd.ExcelWriter('Resultados.xlsx', engine='openpyxl')
    writer.book = book
    writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

    reader = pd.read_excel(r'Resultados.xlsx')

    df = pd.DataFrame.from_dict(dict_)
    df.to_excel(writer, index=False, header=False, startrow=len(reader) + 1)

    writer.close()

Solution

  • TLDR

    # run before initializing the ExcelWriter
    reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
    
    book = load_workbook("Resultados.xlsx")
    
    # use `with` to avoid other exceptions
    with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
        writer.book = book
        writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))
    
        df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)
    
    

    Details

    Recreating your problem with some fake data

    import numpy as np
    from openpyxl import load_workbook
    import pandas as pd
    
    
    if __name__ == "__main__":
    
        # make some random data
        np.random.seed(0)
        df = pd.DataFrame(np.random.random(size=(5, 5)))
    
        # this makes an existing file
        with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
            df.to_excel(excel_writer=writer)
    
        # make new random data
        np.random.seed(1)
        df = pd.DataFrame(np.random.random(size=(5, 5)))
    
        # what you tried...
        book = load_workbook("Resultados.xlsx")
        writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
        writer.book = book
        writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
    
        reader = pd.read_excel("Resultados.xlsx")
    
        # skipping this step as we defined `df` differently
        # df = pd.DataFrame.from_dict(dict_)
        df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)
    
        writer.close()
    
    

    We get the same error plus a FutureWarning

    ...\StackOverflow\answer.py:23: FutureWarning: Setting the `book` attribute is not part of the public API, usage can give unexpected or corrupted results and will be removed in a future version
      writer.book = book
    Traceback (most recent call last):
      File "...\StackOverflow\answer.py", line 24, in <module>
        writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
    AttributeError: can't set attribute 'sheets'
    

    The AttributeError is because sheets is a property of the writer instance. If you're unfamiliar with it, here is a resource.

    In shorter terms, the exception is raised because sheets cannot be modified in the way you're trying. However, you can do this:

    # use the `.update` method
    writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))
    

    That will move us past the the AttributeError, but we'll hit a ValueError a couple lines down:

    reader = pd.read_excel("Resultados.xlsx")
    
    Traceback (most recent call last):
      File "...\StackOverflow\answer.py", line 26, in <module>
        reader = pd.read_excel("Resultados.xlsx")
      ...
      File "...\lib\site-packages\pandas\io\excel\_base.py", line 1656, in __init__
        raise ValueError(
    ValueError: Excel file format cannot be determined, you must specify an engine manually.
    

    Do what the error message says and supply an argument to the engine parameter

    reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
    

    And now we're back to your original zipfile.BadZipFile exception

    Traceback (most recent call last):
      File "...\StackOverflow\answer.py", line 26, in <module>
        reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
      ...
      File "...\Local\Programs\Python\Python310\lib\zipfile.py", line 1334, in _RealGetContents
        raise BadZipFile("File is not a zip file")
    zipfile.BadZipFile: File is not a zip file
    

    After a bit of toying, I noticed that the Resultados.xlsx file could not be opened manually after running this line:

    writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
    

    excel error message

    So I reordered some of the steps in your code:

        # run before initializing the ExcelWriter
        reader = pd.read_excel("Resultados.xlsx", engine="openpyxl")
    
        book = load_workbook("Resultados.xlsx")
    
        # the old way
        # writer = pd.ExcelWriter("Resultados.xlsx", engine="openpyxl")
    
        with pd.ExcelWriter("Resultados.xlsx", engine="openpyxl") as writer:
            writer.book = book
            writer.sheets.update(dict((ws.title, ws) for ws in book.worksheets))
    
            df.to_excel(writer, index=False, header=False, startrow=len(reader)+1)