I am having trouble retrieving the list of Dom Elements when using the method getElementsByName
from HtmlPage
.
Here is the HTML Page. (Trying to get the CategoriaAgente
from the select
tag).
HTML (The part that I need):
<select name="CategoriaAgente">
<option value="-">Escolha uma categoria</option>
<option value="t">Todos</option>
<option value="p">Permissionária de Distribuição</option>
<option value="d">Concessionária de Distribuição</option>
</select>
Snippet of the Java code (Using HtmlUnit):
public List<HtmlOption> listaAgentes() {
List<HtmlOption> listaAgentes = null;
try (WebClient webClient = new WebClient()) {
log.info("COLETANDO AGENTES");
// parâmetros do webclient
webClient.setJavaScriptTimeout(15000);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setTimeout(300000);
String url = "https://www2.aneel.gov.br/aplicacoes_liferay/tarifa/";
HtmlPage page = webClient.getPage(url);
// SELECIONAR CATEGORIA AGENTE
List<DomElement> listaCategoriaAgente = page.getElementsByName("CategoriaAgente");
//...
The list listaCategoriaAgente
is ALWAYS empty.
I tried some solutions found on S.O. but none of them works.
Help? Thanks in advance!
EDIT: After the comment from @hooknc , I found that the page is looking for some kind of captcha from cloudfare. This is what I get from POSTMAN....
Someone knows how to bypass this challenge-form
using HtmlUnit?
Thanks!!!!!
EDIT 2:
Well, I think I made some progress(?)...
This is the code so far....
try (WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
webClient.getOptions().setCssEnabled(false);
webClient.setJavaScriptTimeout(0);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setTimeout(0);
webClient.getCookieManager().setCookiesEnabled(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setRedirectEnabled(true);
webClient.getCache().setMaxSize(0);
webClient.waitForBackgroundJavaScript(10_000);
webClient.waitForBackgroundJavaScriptStartingBefore(10_000);
HtmlPage page = null;
String url = null;
url = "https://www2.aneel.gov.br/aplicacoes_liferay/tarifa/";
page = webClient.getPage(url);
if (page.asXml().contains("Checking if the site connection is secure")) {
log.info(page.asXml());
synchronized(page) {
page.wait(10_000);
}
webClient.waitForBackgroundJavaScript(10_000);
}
And... this is what I get from the log...
<div id="challenge-success" style="display: none;">
<div class="h2">
<span class="icon-wrapper">
<img class="heading-icon" alt="Success icon" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADQAAAA0CAMAAADypuvZAAAANlBMVEUAAAAxMTEwMDAxMTExMTEwMDAwMDAwMDAxMTExMTExMTEwMDAwMDAxMTExMTEwMDAwMDAxMTHB9N+uAAAAEXRSTlMA3zDvfyBAEJC/n3BQz69gX7VMkcMAAAGySURBVEjHnZZbFoMgDEQJiDzVuv/NtgbtFGuQ4/zUKpeMIQbUhXSKE5l1XSn4pFWHRm/WShT1HRLWC01LGxFEVkCc30eYkLJ1Sjk9pvkw690VY6k8DWP9OM9yMG0Koi+mi8XA36NXmW0UXra4eJ3iwHfrfXVlgL0NqqGBHdqfeQhMmyJ48WDuKP81h3+SMPeRKkJcSXiLUK4XTHCjESOnz1VUXQoc6lgi2x4cI5aTQ201Mt8wHysI5fc05M5c81uZEtHcMKhxZ7iYEty1GfhLvGKpm+EYkdGxm1F5axmcB93DoORIbXfdN7f+hlFuyxtDP+sxtBnF43cIYwaZAWRgzxIoiXEMESoPlMhwLRDXeK772CAzXEdBRV7cmnoVBp0OSlyGidEzJTFq5hhcsA5388oSGM6b5p+qjpZrBlMS9xj4AwXmz108ukU1IomM3ceiW0CDwHCqp1NjAqXlFrbga+xuloQJ+tuyfbIBPNpqnmxqT7dPaOnZqBfhSBCteJAxWj58zLk2xgg+SPGYM6dRO6WczSnIxxwEExRaO+UyCUhbOp7CGQ+kxSUfNtLQFC+Po29vvy7jj4y0yAAAAABJRU5ErkJggg=="/>
</span>
Connection is secure
</div>
<div class="core-msg spacer">
Proceeding...
</div>
</div>
So... It says Proceeding...
but nothing happens... I waited 4ever, but it just stucks on the Proceeding
...
Any thoughts?? Thanks!!!
Well. This is what happened. I posted (a related) question, and a guy (possibly from the htmlunit crew) posted an update on git to solve the cookie problem.
When using that updated version (2.68.0-SNAPSHOT
- and I had to update the version of apache-commons-lang3
too) all the problems disappeared. Cloudflare
accepted the connection and everything worked! Here is the final version of the code....
try (WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
String url = "https://www2.aneel.gov.br:443/aplicacoes_liferay/tarifa/";
// parâmetros do webclient
webClient.getOptions().setCssEnabled(true);
webClient.setJavaScriptTimeout(0);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setTimeout(0);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setRedirectEnabled(true);
CookieManager cookies = new CookieManager();
cookies.setCookiesEnabled(true);
webClient.setCookieManager(cookies);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.waitForBackgroundJavaScript(10000);
webClient.waitForBackgroundJavaScriptStartingBefore(10000);
webClient.getCache().setMaxSize(0);
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF);
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);
HtmlPage page = webClient.getPage(url);
webClient.getRefreshHandler().handleRefresh(page, new URL(url), 10);
synchronized(page) {
page.wait(10000);
}
if (page.asXml().contains("Checking if the site connection is secure")) {
log.info(page.asXml());
webClient.waitForBackgroundJavaScript(10_000);
}
List<DomElement> listaCategoriaAgente = page.getElementsByName("CategoriaAgente");
With the updates, and this piece of code, the list of DOM Elements I needed came properly. Thank you all for the assist!