pythonpython-3.xms-wordstylespython-docx

How to combine multiple docx files into a single in python


I combined multiple text files into a single text file using simple code:

with open("Combined_file.txt", 'w') as f1:
    for indx1, fil1 in enumerate(files_to_combine):
        with open(files_to_combine[indx1], 'r') as f2:
            for line1 in f2:
                f1.write(line1)
            f1.write("\n")

files_to_combine is a list containing files to combine into a single file Combined_file.txt. I want to combine MS Word .docx files similar to above and looked at this answer https://stackoverflow.com/a/48925828 using python-docx module. But I couldn't figure out how to open and save a docx file under top for loop of above code, since with open construct won't work here. Also, if the source docx file contains an image, can it be copied using above code and the answer code?


Solution

  • You will need python-docx for that. From there it is pretty straightforward.

    from docx import Document
    
    # Create a new Document object for the combined file
    combined_file = Document()
    
    for file in files_to_combine:
        doc = Document(file)
        
        # If you want to do it manually 
        for para in doc.paragraphs:
            combined_file.add_paragraph(para.text)
    
        # This will needed if you have images
        for rel in doc.part.rels.values():
            if "image" in rel.target_ref:
                combined_file.add_picture(rel.target_part.blob)
    
        # Same goes for charts and other types 
        # in docs find methods for an each type
    
        # This will copy all document 
        # I commented so you can decided which way to choose
        # for element in doc.element.body:
            # combined_file.element.body.append(element)
    
        combined_file.add_page_break()
    
    combined_file.save('Combined_file.docx')