
When using HtmlUnit, how can I configure the underlying NekoHtml parser?

I'm using HtmlUnit to try and scrape a webpage because of it's Javascript support. (I'd rather use Jsoup, but no JS support).

The issue relates to a feature of the underlying NekoHtml parser: ""


This can apparently be enabled in Neko, but I'm using HtmlUnit. Is there a way to configure the underlying Neko parser that HTML unit is using to enable this feature?

When attempting to run this code:

final WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage(url.toString());

I'm getting this error:

Caused by: com.gargoylesoftware.htmlunit.ObjectInstantiationException: unable to create HTML parser
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.<init>(
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.<init>(
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(
    at com.gargoylesoftware.htmlunit.WebClient.getPage(
    at com.gargoylesoftware.htmlunit.WebClient.getPage(
    at com.gargoylesoftware.htmlunit.WebClient.getPage(
Caused by: org.xml.sax.SAXNotRecognizedException: Feature '' is not recognized.
    at org.apache.xerces.parsers.AbstractSAXParser.setFeature(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.<init>(
    ... 41 more


  • Solved...

        BrowserVersionFeatures[] bvf = new BrowserVersionFeatures[1];
        bvf[0] = BrowserVersionFeatures.HTMLIFRAME_IGNORE_SELFCLOSING;
        BrowserVersion bv = new BrowserVersion(
                BrowserVersion.NETSCAPE, "5.0 (Windows; en-US)",
                "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20100722 Firefox/3.6.8",
                (float) 3.6, bvf);
        WebClient webClient = new WebClient(bv);