javaajaxhtmlunit

Catch all ajax requests (grid refresh) using HtmlUnit


I am scrapping a website with this workflow process using HtmlUnit library.

  1. Goes to root site like www.mysite.com
  2. Do login (www.mysite.com/login)
  3. Access to a page where table/grid is located (www.mysite.com/list)

In the last point, a table is present and is being refreshed from the server every 30s. I find those ajax requests (fecth/XHR) in chrome network developer tool, with (www.mysite.com/events) endpoint. My first approach was make a simple http connection/request after the login, but unfortunately the payload of this request is very complex and have dynamic information that i can't reproduce to apply it.

So my second approach is listen request every time the server issues.

i am doing in that way

 //webclient instance config

 webClient.getOptions().setJavaScriptEnabled(false);
 webClient.getOptions().setCssEnabled(false);
 webClient.getOptions().setDoNotTrackEnabled(false);
 webClient.getOptions().setThrowExceptionOnScriptError(false);
 webClient.setAjaxController(new NicelyResynchronizingAjaxController());

 // do login

 HtmlPage loginPage = webClient.getPage("www.mysite.com/login");
 HtmlForm form = loginPage.getForms().get(0);
 form.getInputByName("email").type(username);
 form.getInputByName("password").type(password);
 HtmlButton button = (HtmlButton) loginPage.getElementById("submit");
 button.click();

 // listen ajax requests on www.mysite.com/list

 new WebConnectionWrapper(webClient) {
            @Override
            public WebResponse getResponse(final WebRequest request) throws IOException {
                final WebResponse response = super.getResponse(request);
                logger.info(request.getUrl().toString());
                return response;
            }
        };

  webClient.getPage("www.mysite.com/list");
  webClient.waitForBackgroundJavaScript(7000);

But i am not able to catch the ajax request (/events URI) that brings this data to refresh the grid

Is something missing? Thanks


Solution

  • This works here (HtmlUnit 2.70.0)

    String url = "https://js-tutorials.com/demos/jqgrid_jquery_example_demo/";
    
    try (final WebClient webClient = new WebClient()) {
        webClient.getOptions().setThrowExceptionOnScriptError(false);
    
        new WebConnectionWrapper(webClient) {
            @Override
            public WebResponse getResponse(final WebRequest request) throws IOException {
                final WebResponse response = super.getResponse(request);
    
                System.out.println(request.getUrl().toString());
    
                if (request.getUrl().toString().startsWith("https://jsonplaceholder.typicode.com/posts")) {
                    System.out.println("-----");
                    System.out.println(response.getContentAsString());
                    System.out.println("-----");
                }
                return response;
            }
        };
    
        HtmlPage page = webClient.getPage(url);
        webClient.waitForBackgroundJavaScript(10_000);
        page = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();
    
        System.out.println("----------------");
        System.out.println(page.asNormalizedText());
        System.out.println("----------------");
    }
    

    }