javacloudflarehtmlunit

Java Htmlunit Cloudflare protection stuck on redirecting to the final page


So, I am trying to access a website (using Java Htmlunit - version 2.20 - can´t update it, company policies) - it is a government website - https://www2.aneel.gov.br:443/aplicacoes_liferay/tarifa/ - hosted by Cloudflare.

When accessing via normal Browser, everything is normal. When accessing via htmlunit, Cloudflare starts the proccess of Checking if the site connection is secure.

I can proceed to the next step, which is Connection is secure - Proceeding... But it just stucks there. Please, how can I bypass it correctly.

P.S.: There is no way to have my IP on a whitelist for this. I must go through this verification and redirection on my own.

Thanks in advance!

Some code sample:

BrowserVersion chrome = new BrowserVersion(
            "Netscape", 
            "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36", 
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36", 
            108);
    chrome.setApplicationCodeName("Mozilla");
    chrome.setVendor("Google Inc.");
    chrome.setHtmlAcceptHeader("text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
    chrome.setImgAcceptHeader("image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8");
    chrome.setCssAcceptHeader("text/css,*/*;q=0.1");
    chrome.setScriptAcceptHeader("*/*");
    chrome.setBrowserLanguage("en-US,en;q=0.9,pt;q=0.8,mt;q=0.7");
    chrome.setPlatform("Windows");
    chrome.setUserLanguage("pt-BR");
    chrome.setSystemLanguage("pt-BR");
    
    try (WebClient webClient = new WebClient(chrome)) {
        String url = "https://www2.aneel.gov.br:443/aplicacoes_liferay/tarifa/";
        
        // parâmetros do webclient
        webClient.getOptions().setCssEnabled(true);
        webClient.setJavaScriptTimeout(0);
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setUseInsecureSSL(false);
        webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
        webClient.getOptions().setTimeout(0);
        
        CookieManager cookies = new CookieManager();
        
        cookies.setCookiesEnabled(true);
        webClient.setCookieManager(cookies);
        
        
        webClient.getOptions().setJavaScriptEnabled(true);
        webClient.getOptions().setRedirectEnabled(true);
        
        webClient.waitForBackgroundJavaScript(10000);
        webClient.waitForBackgroundJavaScriptStartingBefore(10000);
        
        webClient.getCache().setMaxSize(0);
        
        HtmlPage page = webClient.getPage(url);
        webClient.getRefreshHandler().handleRefresh(page, new URL(url), 10);
        
        synchronized(page) {
            page.wait(10000);
        }
        
        URL _url = new URL(url);
        
        for(Cookie c : webClient.getCookies(_url)) {
            if (c.getName().contains("cf_chl_2")) {
                
                Cookie cook = new Cookie(c.getDomain(), c.getName(), c.getValue(), c.getPath(), -1, false);
                webClient.getCookieManager().removeCookie(c);
                webClient.getCookieManager().addCookie(cook);
            }
        }
        
        if (page.asXml().contains("Checking if the site connection is secure")) {
            log.info(page.asXml());
            
            page = webClient.getPage(url);
            webClient.waitForBackgroundJavaScript(10_000);
        }

(Some parts of this code, I got from this question/answer)

Some parts of the log I got so far...

enter image description here

As you can see, Checking if the site connection is secure and Proceeding... appears. But... It´s just it.... (Error: 1020, Firewall stuff...)...

P.S.: This manually added cookie is to replace a cookie that was returning an error of negative max age.

[com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl] (JS executor for com.gargoylesoftware.htmlunit.WebClient@724c8784) set-cookie http-equiv meta tag: invalid cookie 'cf_chl_2=; Max-Age=-99999999;'; reason: 'Negative 'max-age' attribute: -99999999'.

So please, in the name of the Gods of Programming Languages, how can I procceed to the final webpage?

Thanks!!!!


Solution

  • There was an update/fix regarding the cookie processing in HtmlUnit. The latest 2.68.0-SNAPSHOT or all future releases should fix that.

    See https://github.com/HtmlUnit/htmlunit/issues/524 for more.

    This is also related to HtmlUnit returning empty list of DomElements