I am trying to use jTidy for extract data from (real world)HTML.But jTidy doesnt parse custom tags.
<html>
<body>
<myCustomTag>some text</myCustomTag>
<anotherCustom>more text</anotherCustom>
</body>
</html>
I cant get texts between custom tags.I have to use jTidy because i ll use xpath.
I tried HTMLCleaner but it doesnt support full xpath functions.
You can also set the properties using a Java Properties object, for example:
import java.util.Properties;
Properties oProps = new Properties();
oProps.setProperty("new-blocklevel-tags", "header hgroup article footer nav");
Tidy tidy = new Tidy();
tidy.setConfigurationFromProps(oProps);
This should save you having to create and load a configuration file.