pythonpython-docx

Python-Docx replacing texts with tables


I am currently confused on how to insert a table in the middle of a document, like is it possible to change a text placeholder into a table in python?
For example:

*** PARAGRAPH ****

TEXT_PLACEHOLDER

*** PARAGRAPH****

I want to replace the TEXT_PLACEHOLDER with a table, is it possible?

I've done this and it creates the table but inserts it at the bottom of the next paragraph, skipping the second paragraph. The table was supposed to be before the second one but instead it is after it.

for paragraph in doc.paragraphs:
    if placeholder in paragraph.text:
        paragraph.text = paragraph.text.replace(placeholder, '')

        table = doc.add_table(rows=3, cols=3)

        for row in range(3):
            for col in range(3):
                table.cell(row, col).text = f"Row {row + 1}, Col {col + 1}"

Solution

  • Base Explanation of Issue

    So from what I know of the python-docx the main thing of import here is that it breaks apart the ".docx" file into an Open XML tree behind the scenes that then wraps those XML objects in a python class for simplified use. These simplified objects are then filtered by type and stored in the Document class for easy access, resulting in the relationships between the elements being obscured.

    In the example you provided, you are simply interacting with the 'simplified' lists of wrapped elements from the XML structure, henceforth limiting what you can change directly without some extra work. To do what you want correctly there are 2 main ways that you could use bellow.

    NOTE: If you want to replace "TEXT_PLACEHOLDER" this will not do that in full but should be easaly modifiable if that feature is required.

    Solution 1: Simple but limited

    So this solution may work for you since you might only be working with paragraph elements, like in your example. This one is simple in that you just need to create a second blank Document object and copy the paragraphs over with adds to the bottom of the document for the preserved paragraphs and new table.

    import docx
    
    REPLACE_KEY = "TEXT_PLACEHOLDER"
    
    doc = docx.Document("example.docx")
    new_doc = docx.Document()
    for par in doc.paragraphs:
        if REPLACE_KEY in par.text:
            table = new_doc.add_table(rows=3, cols=3)
            for row in range(3):
                for col in range(3):
                    table.cell(row, col).text = f"Row {row + 1}, Col {col + 1}"
        else:
            new_doc.add_paragraph(par.text, par.style)
    new_doc.save("example_modified.docx")
    

    Solution 2: Complete

    For this solution, we bypass the protections of the simple objects and directly use the Open XML implementation underneath to swap out the object in it. The underlying OXML types are all named along the lines of 'CT_{type}'.

    So first since we want to create a Table outside of the simplified utility functions, that appends it to the end of the file, we need to first create a CT_Tbl OXML object and then warp it into a Table object for easy use.

    Secondly, to replace it we can call the underlying Oxml Element replace function to replace the paragraph element stored in the _p value of the wrapper class. This function though replaces a child so we first half to get the paragraph objects parent so we can replace it within its parent element.

    from docx import Document as f_Document # A Function
    from docx.document import Document # A Class (Yay Naming Conventions!)
    from docx.oxml.table import CT_Tbl
    from docx.table import Table
    from docx.oxml.xmlchemy import BaseOxmlElement
    from docx.types import ProvidesStoryPart
    
    def replace_with_table(doc: Document, elm: BaseOxmlElement, parent: ProvidesStoryPart, rows: int, cols: int) -> Table:
        table_elm = CT_Tbl.new_tbl(3, 3, doc._block_width)
        elm.getparent().replace(elm, table_elm)
        return Table(table_elm, parent)
    
    REPLACE_KEY = "TEXT_PLACEHOLDER"
    
    doc = f_Document("example.docx")
    for par in doc.paragraphs:
        if REPLACE_KEY in par.text:
            table = replace_with_table(doc, par._p, par._parent, 3, 3)
            for row in range(3):
                for col in range(3):
                    table.cell(row, col).text = f"Row {row + 1}, Col {col + 1}"
    doc.save("example_modified.docx")