So, I am trying to access a website (using Java Htmlunit - version 2.20 - can´t update it, company policies
) - it is a government website - https://www2.aneel.gov.br:443/aplicacoes_liferay/tarifa/
- hosted by Cloudflare.
When accessing via normal Browser, everything is normal. When accessing via htmlunit
, Cloudflare starts the proccess of Checking if the site connection is secure
.
I can proceed to the next step, which is Connection is secure - Proceeding...
But it just stucks there.
Please, how can I bypass it correctly.
P.S.: There is no way to have my IP
on a whitelist for this. I must go through this verification and redirection on my own.
Thanks in advance!
Some code sample:
BrowserVersion chrome = new BrowserVersion(
"Netscape",
"5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36",
108);
chrome.setApplicationCodeName("Mozilla");
chrome.setVendor("Google Inc.");
chrome.setHtmlAcceptHeader("text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
chrome.setImgAcceptHeader("image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8");
chrome.setCssAcceptHeader("text/css,*/*;q=0.1");
chrome.setScriptAcceptHeader("*/*");
chrome.setBrowserLanguage("en-US,en;q=0.9,pt;q=0.8,mt;q=0.7");
chrome.setPlatform("Windows");
chrome.setUserLanguage("pt-BR");
chrome.setSystemLanguage("pt-BR");
try (WebClient webClient = new WebClient(chrome)) {
String url = "https://www2.aneel.gov.br:443/aplicacoes_liferay/tarifa/";
// parâmetros do webclient
webClient.getOptions().setCssEnabled(true);
webClient.setJavaScriptTimeout(0);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setUseInsecureSSL(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setTimeout(0);
CookieManager cookies = new CookieManager();
cookies.setCookiesEnabled(true);
webClient.setCookieManager(cookies);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setRedirectEnabled(true);
webClient.waitForBackgroundJavaScript(10000);
webClient.waitForBackgroundJavaScriptStartingBefore(10000);
webClient.getCache().setMaxSize(0);
HtmlPage page = webClient.getPage(url);
webClient.getRefreshHandler().handleRefresh(page, new URL(url), 10);
synchronized(page) {
page.wait(10000);
}
URL _url = new URL(url);
for(Cookie c : webClient.getCookies(_url)) {
if (c.getName().contains("cf_chl_2")) {
Cookie cook = new Cookie(c.getDomain(), c.getName(), c.getValue(), c.getPath(), -1, false);
webClient.getCookieManager().removeCookie(c);
webClient.getCookieManager().addCookie(cook);
}
}
if (page.asXml().contains("Checking if the site connection is secure")) {
log.info(page.asXml());
page = webClient.getPage(url);
webClient.waitForBackgroundJavaScript(10_000);
}
(Some parts of this code, I got from this question/answer)
Some parts of the log I got so far...
As you can see, Checking if the site connection is secure
and Proceeding...
appears. But... It´s just it.... (Error: 1020, Firewall stuff...)...
P.S.: This manually added cookie is to replace a cookie that was returning an error of negative max age.
[com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl] (JS executor for com.gargoylesoftware.htmlunit.WebClient@724c8784) set-cookie http-equiv meta tag: invalid cookie 'cf_chl_2=; Max-Age=-99999999;'; reason: 'Negative 'max-age' attribute: -99999999'.
So please, in the name of the Gods of Programming Languages
, how can I procceed to the final webpage?
Thanks!!!!
There was an update/fix regarding the cookie processing in HtmlUnit. The latest 2.68.0-SNAPSHOT or all future releases should fix that.
See https://github.com/HtmlUnit/htmlunit/issues/524 for more.
This is also related to HtmlUnit returning empty list of DomElements