I have this structure in my html document:
<p>
"<em>You</em> began the evening well, Charlotte," said Mrs. Bennet with civil self–command to Miss Lucas. "<em>You</em> were Mr. Bingley's first choice."
</p>
But i need my "plain text" to be wrappted in tags, to be able to process it :)
<p>
<text>"</text>
<em>You</em>
<text> began the evening well, Charlotte," said Mrs. Bennet with civil self–command to Miss Lucas. "</text>
<em>You</em>
<text> were Mr. Bingley's first choice."</text>
</p>
Any ideas how to accomplish this? I've looked at tagsoup and jsoup but i dont seem a way to solve this easily. Maybe using some fancy regexp.
Thanks
Here's a suggestion:
public static Node toTextElement(String str) {
Element e = new Element(Tag.valueOf("text"), "");
e.appendText(str);
return e;
}
public static void replaceTextNodes(Node root) {
if (root instanceof TextNode)
root.replaceWith(toTextElement(((TextNode) root).text()));
else
for (Node child : root.childNodes())
replaceTextNodes(child);
}
Test code:
String html = "<p>\"<em>You</em> began the evening well, Charlotte,\" " +
"said Mrs. Bennet with civil self–command to Miss Lucas." +
" \"<em>You</em> were Mr. Bingley's first choice.\"</p>";
Document doc = Jsoup.parse(html);
for (Node n : doc.body().children())
replaceTextNodes(n);
System.out.println(doc);
Output:
<html>
<head></head>
<body>
<p>
<text>
"
</text><em>
<text>
You
</text></em>
<text>
began the evening well, Charlotte," said Mrs. Bennet with civil self–command to Miss Lucas. "
</text><em>
<text>
You
</text></em>
<text>
were Mr. Bingley's first choice."
</text></p>
</body>
</html>