html-parsinghtmlcleaner

Getting cleaned HTML in text from HtmlCleaner


I want to see the cleaned HTML that we get from HTMLCleaner. I see there is a method called serialize on TagNode, however don't know how to use it. Does anybody have any sample code for it?

Thanks Nayn


Solution

  • Here's the sample code:

    HtmlCleaner htmlCleaner = new HtmlCleaner();
    
    TagNode root = htmlCleaner.clean(url);
    
    HtmlCleaner.getInnerHtml(root);
    
    String html = "<" + root.getName() + ">" + htmlCleaner.getInnerHtml(root) + "</" + root.getName() + ">";