htmlparsingjtidy

How to add new tags to JTidy?


I am trying to use jTidy for extract data from (real world)HTML.But jTidy doesnt parse custom tags.

<html>
  <body>
    <myCustomTag>some text</myCustomTag>
    <anotherCustom>more text</anotherCustom>
  </body>
</html>

I cant get texts between custom tags.I have to use jTidy because i ll use xpath.

I tried HTMLCleaner but it doesnt support full xpath functions.


Solution

  • You can also set the properties using a Java Properties object, for example:

    import java.util.Properties;
    Properties oProps = new Properties();
    oProps.setProperty("new-blocklevel-tags", "header hgroup article footer nav");
    
    Tidy tidy = new Tidy();
    tidy.setConfigurationFromProps(oProps);
    

    This should save you having to create and load a configuration file.