I am scrapping a website with this workflow process using HtmlUnit library.
In the last point, a table is present and is being refreshed from the server every 30s. I find those ajax requests (fecth/XHR) in chrome network developer tool, with (www.mysite.com/events) endpoint. My first approach was make a simple http connection/request after the login, but unfortunately the payload of this request is very complex and have dynamic information that i can't reproduce to apply it.
So my second approach is listen request every time the server issues.
i am doing in that way
//webclient instance config
webClient.getOptions().setJavaScriptEnabled(false);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setDoNotTrackEnabled(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
// do login
HtmlPage loginPage = webClient.getPage("www.mysite.com/login");
HtmlForm form = loginPage.getForms().get(0);
form.getInputByName("email").type(username);
form.getInputByName("password").type(password);
HtmlButton button = (HtmlButton) loginPage.getElementById("submit");
button.click();
// listen ajax requests on www.mysite.com/list
new WebConnectionWrapper(webClient) {
@Override
public WebResponse getResponse(final WebRequest request) throws IOException {
final WebResponse response = super.getResponse(request);
logger.info(request.getUrl().toString());
return response;
}
};
webClient.getPage("www.mysite.com/list");
webClient.waitForBackgroundJavaScript(7000);
But i am not able to catch the ajax request (/events URI) that brings this data to refresh the grid
Is something missing? Thanks
This works here (HtmlUnit 2.70.0)
String url = "https://js-tutorials.com/demos/jqgrid_jquery_example_demo/";
try (final WebClient webClient = new WebClient()) {
webClient.getOptions().setThrowExceptionOnScriptError(false);
new WebConnectionWrapper(webClient) {
@Override
public WebResponse getResponse(final WebRequest request) throws IOException {
final WebResponse response = super.getResponse(request);
System.out.println(request.getUrl().toString());
if (request.getUrl().toString().startsWith("https://jsonplaceholder.typicode.com/posts")) {
System.out.println("-----");
System.out.println(response.getContentAsString());
System.out.println("-----");
}
return response;
}
};
HtmlPage page = webClient.getPage(url);
webClient.waitForBackgroundJavaScript(10_000);
page = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();
System.out.println("----------------");
System.out.println(page.asNormalizedText());
System.out.println("----------------");
}
}