pythonpowerpointpython-pptx

Using copy.deepcopy with python-pptx to add a column to a table leads to cell attributes being corrupted


I'm trying to append a column to a table in PowerPoint using python-pptx. A number of threads mention the solution:

def append_col(prs_obj, sl_i, sh_i):
    # prs_obj is a pptx.Presentation('path') object. 
    # sli_i and sh_i are int indexs to locate a particular table object.

    tab = prs_obj.slides[sl_i].shapes[sh_i].table
    new_col = copy.deepcopy(tab._tbl.tblGrid.gridCol_lst[-1])
    tab._tbl.tblGrid.append(new_col)  # copies last grid element

    for tr in tab._tbl.tr_lst:
        # duplicate last cell of each row
        new_tc = copy.deepcopy(tr.tc_lst[-1])
        tr.append(new_tc)
        cell = _Cell(new_tc, tr.tc_lst)
        cell.text = '--'
    return tab

After running this, when you open PowerPoint the new column will be there, but it won't contain the cell.text. If you click in the cell and type, the letters will appear in the cell of the previous column. Saving powerpoint enables you to edit the column as normal, but obviously you've lost the cell.text (and formatting).

QUESTION UPDATE 1- FOLLOWING COMMENT FROM @scanny

For the simplest possible case, a (1x3) table, like so: |xx|--|xx| the tab._tbl.xml prints before and after appending the column are:

xml diff 1

xml diff 2

xml diff 3

xml diff 4

QUESTION UPDATE 2- FOLLOWING COMMENT FROM @scanny I modified the above append_col function to forcibly remove the extLst element from the copied gridCol. This stopped the problem of typing in one cell and text appearing in another cell.

def append_col(prs_obj, sl_i, sh_i):
    # existing lines removed for brevity

    # New Code

    tblchildren = tab._tbl.getchildren()
        for child in tblchildren:
            if isinstance(child, oxml.table.CT_TableGrid):
                ws = set()
                for j in child:
                    if j.w not in ws:
                        ws.add(j.w)
                    else:
                        for elem in j:
                            j.remove(elem)
    return tab

However cell.text(and formatting)are still missing. Moreover, manually saving the presentation changes the tab.xml object back. The screenshots before and after manually opening the PowerPoint presentation are:

AFTER removing extLst, before manual save - xml diff 1

AFTER removing extLst, AFTER manual save - xml diff 2


Solution

  • If you're serious about solving this sort of problem, you'll need to reverse-engineer the Word XML for this aspect of tables.

    The place to start is with before and after (adding a column) XML dumps of the table, identifying the changes made by Word, then duplicating those that matter (things like revision-numbers probably don't matter).

    This process is simplified by having a small example, say a 2 x 2 table to a 2 x 3 table.

    You can get the XML for a python-docx XML element using its .xml attribute, like:

    print(tab._tbl.xml)
    

    You could compare the deepcopy results and then have concrete differences to start to explain the results not working. I expect you'll find that table items have unique ids and when you duplicate those, funky things happen.