I would like to find text content between two keywords in .doc files, and conditionally render that text content or hide it. For example:
Lorem Ipsum is simply dummy text
${if condition}
of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s${endif}
When I parse the document using the Apache - POI, I would like to be able in some way to spot in the document each and every content between these blockquotes ${if condition}
${endif}
and conditionally render it or not in the next document I want to produce.
So the above text after my parsing should have the following two different forms:
1) In case the condition is satisfied
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s
or
2) In case the condition is not satisfied
Lorem Ipsum is simply dummy text
I have tried to do this by using the XWPFParagraph
object and then XWPFRun
but that is no way reliable way as a run can be randomly split in the middle of a word under unpredictable conditions.
Could you please propose any reliable way to achieve my use case? Thanks in advance.
Take this as an example (code is tested):
class ParagraphModifier {
private final Pattern pIf = Pattern.compile("\\$\\{if\\s+(\\w+)\\}");
private final Pattern pEIf = Pattern.compile("\\$\\{endif\\}");
private final Function<String, Boolean> processor;
public ParagraphModifier(Function<String, Boolean> processor) {
this.processor = processor;
}
// Process
static class Pair<K, V> {
public K key;
public V value;
public Pair(K key, V value) {
this.key = key;
this.value = value;
}
}
// https://stackoverflow.com/questions/23112924
public static void cloneRun(XWPFRun clone, XWPFRun source) {
CTRPr rPr = clone.getCTR().isSetRPr() ? clone.getCTR().getRPr() : clone.getCTR().addNewRPr();
rPr.set(source.getCTR().getRPr());
clone.setText(source.getText(0));
}
// Split runs in paragraph at a specific text offset and returns the run index
int splitAtTextPosition(XWPFParagraph paragraph, int position) {
List<XWPFRun> runs = paragraph.getRuns();
int offset = 0;
for (int i = 0; i < runs.size(); i++) {
XWPFRun run = runs.get(i);
String text = run.getText(0);
int length = text.length();
if (position >= (offset + length)) {
offset += length;
continue;
}
// Do split
XWPFRun run2 = paragraph.insertNewRun(i + 1);
cloneRun(run2, run);
run.setText(text.substring(0, position - offset), 0);
run2.setText(text.substring(position - offset), 0);
return i + 1;
}
return -1;
}
String getParagraphText(XWPFParagraph paragraph) {
StringBuilder sb = new StringBuilder("");
for (XWPFRun run : paragraph.getRuns()) sb.append(run.getText(0));
return sb.toString();
}
void removeRunsRange(XWPFParagraph paragraph, int from, int to) {
int runs = paragraph.getRuns().size();
to = Math.min(to, runs);
for (int i = (to - 1); i >= from; i--) {
paragraph.removeRun(i);
}
}
Pair<Integer, String> extractToken(Pattern pattern, XWPFParagraph paragraph) {
String text = getParagraphText(paragraph);
Matcher matcher = pattern.matcher(text);
if (matcher.find()) {
int rStart = splitAtTextPosition(paragraph, matcher.start());
int rEnd = splitAtTextPosition(paragraph, matcher.end());
removeRunsRange(paragraph, rStart, rEnd);
return new Pair<>(rStart, matcher.group());
} else {
return new Pair<>(-1, "");
}
}
void applyParagraph(XWPFParagraph paragraph) {
int lastIf = -1;
while (true) {
var tIf = extractToken(pIf, paragraph);
if (tIf.key == -1) {
break;
}
if (tIf.key < lastIf) {
throw new IllegalStateException("If conditions can not be nested");
}
var tEIf = extractToken(pEIf, paragraph);
if (tEIf.key == -1) {
throw new IllegalStateException("If condition missing endif");
}
var m = pIf.matcher(tIf.value);
var keep = m.find() && processor.apply(m.group(1));
if (!keep) {
removeRunsRange(paragraph, tIf.key, tEIf.key);
}
lastIf = tEIf.key;
}
}
void apply(Iterable<XWPFParagraph> paragraphs) {
for (XWPFParagraph p : paragraphs) {
applyParagraph(p);
}
}
}
Usage:
class Main {
private static XWPFDocument loadDoc(String name) throws IOException, InvalidFormatException {
String path = Main.class.getClassLoader().getResource(name).getPath();
FileInputStream fis = new FileInputStream( path);
return new XWPFDocument(OPCPackage.open(fis));
}
private static void saveDoc(String path, XWPFDocument doc) throws IOException {
try (var fos = new FileOutputStream(path)) {
doc.write(fos);
}
}
public static void main (String[] args) throws Exception {
var xdoc = loadDoc("test.docx");
var pm = new ParagraphModifier(str -> str.toLowerCase().equals("true"));
pm.apply(xdoc.getParagraphs());
saveDoc("test.out.docx", xdoc);
}
}
This solution does not support ${if }
blocks spanning over paragraphs, if nesting, nor Table structures. Expanding the solution to support them should be straightforward.