I am using HtmlCleaner with ColdFusion. In the code below I am traversing the node tree and looking for content nodes. What I want to do is be able to modify the text content of the node.
node.traverse(new TagNodeVisitor() {
public boolean visit(TagNode tagNode, HtmlNode htmlNode) {
if (htmlNode instanceof ContentNode) {
ContentNode content = ((ContentNode) htmlNode);
String textContent = content.getContent();
}
// tells visitor to continue traversing the DOM tree
return true;
}
});
The example I am using is:
// traverse whole DOM and update images to absolute URLs
node.traverse(new TagNodeVisitor() {
public boolean visit(TagNode tagNode, HtmlNode htmlNode) {
if (htmlNode instanceof TagNode) {
TagNode tag = (TagNode) htmlNode;
String tagName = tag.getName();
if ("img".equals(tagName)) {
String src = tag.getAttributeByName("src");
if (src != null) {
tag.setAttribute("src", Utils.fullUrl(siteUrl, src));
}
}
} else if (htmlNode instanceof CommentNode) {
CommentNode comment = ((CommentNode) htmlNode);
comment.getContent().append(" -- By HtmlCleaner");
}
// tells visitor to continue traversing the DOM tree
return true;
}
});
What i wanted to do was grab the content between the html tags so that i can translate them to another language , without messing with html tags,images, ect...
node.traverse(new TagNodeVisitor() {
public boolean visit(TagNode tagNode, HtmlNode htmlNode) {
if (htmlNode instanceof ContentNode) {
ContentNode content = ((ContentNode) htmlNode);
URLConnection urlConn;
StringBuilder result = new StringBuilder();
String USER_AGENT = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)";
String text = content.getContent();
String strUrl = "http://translate.google.com/translate_a/t?client=t&sl=#arguments.FromLanguage#&tl=#arguments.ToLanguage#&hl=#arguments.ToLanguage#&sc=2&ie=UTF-8&oe=UTF-8&oc=1&otf=1&ssel=0&tsel=0&q=" + URLEncoder.encode(text);
URL url = new URL(strUrl);
urlConn = url.openConnection();
urlConn.addRequestProperty("User-Agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
Reader reader = new InputStreamReader(urlConn.getInputStream(),
"utf-8");
JsonArray gRet = new Gson().fromJson(reader, JsonArray.class);
StringBuffer newContent = new StringBuffer(1000);
gRet.get(0)?.each() { el -> newContent.append(el.getAsJsonArray()?.get(0)?.getAsString()); };
tagNode.insertChildAfter(htmlNode, new ContentNode(newContent.toString()));
tagNode.removeChild(htmlNode);
}
}
});