pythonexcelcopyopenpyxl

How to create a copy of an Excel file with rich text using Python


I am looking for a method to copy Excel data from an other file. I have a 'file_1.xlsx', and I would like to create a 'file_1_copy.xlsx' that does not exist. The copy has to be exactly the same that the original file.

Is there some method in Python to copy an Excel file completely?

I tried to write code that does this manually:

import openpyxl
from openpyxl.styles import Font

wb = openpyxl.load_workbook('test excel.xlsx')
ws = wb.active

new_wb = openpyxl.Workbook()
new_ws = new_wb.active

for row in ws.iter_rows():
    for cell in row:
        new_cell = new_ws.cell(row=cell.row, column=cell.col_idx, value=cell.value)
        if cell.has_style:
            new_cell.font = Font(name=cell.font.name, size=cell.font.size, bold=cell.font.bold, italic=cell.font.italic, color=cell.font.color)

new_wb.save('test copie excel.xlsx')

enter image description here

In the image above, on the left is the original file. On the right, the copy.

The issue is that the text in the first cell of the copy is supposed to be in 2 colors: red and blue, and highlighted in yellow. In the copy, I got only one color, all the text became red, and is not highlighted anymore.


Solution

  • Excel cells normally have only one font. If there is text in rich text format in Excel cells, then the cell content itself is rich text cell content. But rich text cell content is not read while load workbook file per default for performance reasons. So one must set rich_text=True in load_workbook('test excel.xlsx', rich_text=True) if needed.

    The color is the fill color of the cell. To keep this, the fill must also be copied.

    Btw: There is copy from copy which can be used to copy whole cell.font and cell.fill to avoid the need to copy each single font- and fill-property.

    And the column widths are worksheet dimensions settings, which needs to be set for the new_ws too if they shall be the same.

    from openpyxl import load_workbook, Workbook
    from copy import copy
    
    wb = load_workbook('test excel.xlsx', rich_text=True)
    ws = wb.active
    
    new_wb = Workbook()
    new_ws = new_wb.active
    
    for row in ws.iter_rows():
        for cell in row:
            new_cell = new_ws.cell(row=cell.row, column=cell.col_idx, value=cell.value)
            new_cell.font = copy(cell.font)
            new_cell.fill = copy(cell.fill)
    
    for column_name in ws.column_dimensions:
        new_ws.column_dimensions[column_name].width = ws.column_dimensions[column_name].width
                
    new_wb.save('test copie excel.xlsx')
    

    That is for cell- and worksheet-properties you have now. if the need is copying all possible cell- and worksheet-properties, then the code gets very long and complex.

    Above code works using openpyxl version 3.1.2. Version 3.1.3 seems to have bugs while copying rich text from shared-strings into inline-strings when text runs contain spaces only. Then the XML for the text should be <t xml:space="preserve"> </t> but is <t> </t> using Version 3.1.3. That leads to an error while opening the file in Excel.

    But would not using another approach more convenient? Open the test excel.xlsxusing wb = openpyxl.load_workbook('test excel.xlsx', rich_text=True). Then do the needed changing in wb. After all use wb.save('test copie excel.xlsx') to save the file under a new file name. No need for new_wb and new_ws at all.