javareplacems-wordapache-poibookmarks

How to replace the content set by bookmarks in Word using POI?


I am studying a function that replaces the content of bookmarks. I wrote an example. In the input word, I set the 'name' bookmark for Bill. The code below replaces the 'name' bookmark corresponding to Bill with Ryan. This example runs to achieve the expected effect. This is just a simple demo. In reality, there is a problem that after the replacement, iterating over the objects returned by xwpfParagraph.getRuns() will throw an exception.The code below is for deleting the content within a bookmark and inserting a new XWPFRun. In my actual project, each bookmark corresponds to an XWPFRun. Is it possible to find the corresponding XWPFRun directly through the bookmark and set its content? Since XWPFRuns can have different fonts, if we set them directly, we don't need to set the font separately.

package com.office;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.apache.xmlbeans.XmlCursor;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import org.w3c.dom.Node;

import java.io.*;
import java.math.BigInteger;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class BookMarkTest {


    public static void main(String[] args) {
        try {
            FileInputStream is = new FileInputStream("f:\\test.docx");
            XWPFDocument document = new XWPFDocument(is);

            Map<String, Object> bookTagMap = new HashMap<>();
            bookTagMap.put("name", "Ryan");
            replaceBookTag(document, bookTagMap);

            FileOutputStream os = new FileOutputStream("f:\\test1.docx");
            document.write(os);

        } catch (Exception e) {
            e.printStackTrace();
        }
    }


    public static void replaceBookTag(XWPFDocument document, Map<String, Object> bookTagMap) {
        List<XWPFParagraph> paragraphList = document.getParagraphs();
        for (XWPFParagraph xwpfParagraph : paragraphList) {
            CTP ctp = xwpfParagraph.getCTP();
            List<CTBookmark> bookmarks =  ctp.getBookmarkStartList();
            for(CTBookmark bookmark: bookmarks) {
                if (bookTagMap.containsKey(bookmark.getName())) {

                    XWPFRun run = xwpfParagraph.createRun();
                    run.setText(bookTagMap.get(bookmark.getName()).toString());

                    Node firstNode = bookmark.getDomNode();
                    Node nextNode = firstNode.getNextSibling();
                    while (nextNode != null) {
                        // 循环查找结束符
                        String nodeName = nextNode.getNodeName();
                        if (nodeName.equals("w:bookmarkEnd")) {
                            break;
                        }

                        // 删除中间的非结束节点,即删除原书签内容
                        Node delNode = nextNode;
                        nextNode = nextNode.getNextSibling();

                        ctp.getDomNode().removeChild(delNode);
                    }

                    if (nextNode == null) {
                        // 始终找不到结束标识的,就在书签前面添加
                        ctp.getDomNode().insertBefore(run.getCTR().getDomNode(), firstNode);
                    } else {
                        // 找到结束符,将新内容添加到结束符之前,即内容写入bookmark中间
                        ctp.getDomNode().insertBefore(run.getCTR().getDomNode(), nextNode);
                    }
                }
            }

        }

        for (XWPFParagraph xwpfParagraph : paragraphList) {
            for(XWPFRun run: xwpfParagraph.getRuns()){
                System.out.println(run.text());
            }
        }
    }

}

Here is the exception information:

org.apache.xmlbeans.impl.values.XmlValueDisconnectedException
    at org.apache.xmlbeans.impl.values.XmlObjectBase.check_orphaned(XmlObjectBase.java:1258)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.newCursor(XmlObjectBase.java:286)
    at org.apache.poi.xwpf.usermodel.XWPFRun.text(XWPFRun.java:1262)
    at com.office.BookMarkTest.replaceBookTag(BookMarkTest.java:79)
    at com.office.BookMarkTest.main(BookMarkTest.java:27)

input word test.docx

input word test.docx

output word test1.docx

output word test1.docx

In actual projects, Word files are often large, with numerous bookmarks, which may result in performance issues. Is there an efficient way to locate an XWPFRun through a bookmark?


Solution

  • A bookmarked text part in Microsoft Word in Office Open XML format (*.docx) is between bookmarkStart and bookmarkEnd element in document.xml. For a run element this looks like so:

    test.docx:

    enter image description here

    Note, the text to be bookmarked needs to be between bookmarkStart and bookmarkEnd element. To achieve this, the text run needs to be selected while setting the bookmark. If there is nothing selected while setting the bookmark then there also is nothing between bookmarkStart and bookmarkEnd element. Then nothing is bookmarked and the following will not work.

    Unzip that file and have a look into /word/document.xml. You will find XML like this:

    ...
    <w:bookmarkStart w:id="0" w:name="name"/>
    <w:r>
     <w:t>name</w:t>
    </w:r>
    <w:bookmarkEnd w:id="0"/>
    ...
    

    So while looping over paragraphs and runs one could check whether the found run has a bookmarkStart as previous sibling in XML. If so then this run is bookmarked.

    To get the previous sibling in XML a XmlCursor is needed.

    Complete code example:

    import java.io.FileInputStream;
    import java.io.FileOutputStream;
    
    import org.apache.poi.xwpf.usermodel.*;
    import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark;
    import org.apache.xmlbeans.XmlCursor;
    
    import java.util.Map;
    import java.util.HashMap;
    
    public class WordFindBookmarkedXWPFRun {
        
     static boolean isXWPFRunAfterBookmark(XWPFRun run, String bookmarkName) {
      XmlCursor cursor = run.getCTR().newCursor();
      if (!cursor.toPrevSibling()) { // is there a previous sibling?
       return false; // if not, then there is no bookmark
      }
      if (!(cursor.getObject() instanceof CTBookmark)) { // is previous sibling instance of CTBookmark?
       return false; // if not, then there is no bookmark
      } 
      CTBookmark bookmarkStart = (CTBookmark)cursor.getObject();
      if (!bookmarkName.equals(bookmarkStart.getName())) { // is bookmark name equal to searched name?
       return false; // if not, then this is not searched bookmark
      }  
      return true; // this run is immediately after searched bookmark
     }
    
     public static void main(String[] args) throws Exception {
    
      XWPFDocument document = new XWPFDocument(new FileInputStream("./test.docx"));
    
      Map<String, String> bookTagMap = new HashMap<>();
      bookTagMap.put("name", "Axel Richter");  
      bookTagMap.put("amount", "4,567.89");  
      bookTagMap.put("date", "2023-12-27");  
    
      for (XWPFParagraph paragraph : document.getParagraphs()) {
       for (XWPFRun run : paragraph.getRuns()) {
        for (String key : bookTagMap.keySet()) {
         if (isXWPFRunAfterBookmark(run, key )) { // check if the run is after a bookmark having this key as name
          // if so, set run text to given string
          run.setText(bookTagMap.get(key), 0);
         }
        }
       }
      }
    
      FileOutputStream out = new FileOutputStream("./test1.docx");
      document.write(out);
      out.close();
      document.close();
    
     }
    }
    

    Result:

    enter image description here


    This code sample is tested and works for me using test.docx created using Microsoft Word 365 and using Apache POI version 3.17, 4.0.* and current 5.2.5.