I'm writing a bidi String to an MS Word file using Apache POI after wrapping it with the sequence
aString = "\u202E" + aString + "\u202C";
The text renders correctly in the file, and reads fine when I retrieve the string again. But if I modify the file in anyway, suddenly, reading that string returns true with isBlank().
Thank you in advance for any suggestions/help!
When Microsoft Word
stores bidirectional text in it's Office Open XML
*.docx
format, then it sometimes uses special text run elements w:bdo
(bi directional orientation). Apache poi
does not read those elements until now. So if a XWPFParagraph
contains such elements, then paragraph.getText()
will return an empty string.
One could using org.apache.xmlbeans.XmlCursor
to really get all text from all XWPFParagraph
s like so:
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.xmlbeans.XmlCursor;
public class ReadWordParagraphs {
static String getAllTextFromParagraph(XWPFParagraph paragraph) {
XmlCursor cursor = paragraph.getCTP().newCursor();
return cursor.getTextValue();
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("WordDocument.docx"));
for (XWPFParagraph paragraph : document.getParagraphs()) {
System.out.println(paragraph.getText()); // will not return text in w:bdo elements
System.out.println(getAllTextFromParagraph(paragraph)); // will return all text content of paragraph
}
}
}