I want to scrap a website, but I can't use jsoup because jsoup don't have JavaScript execution. I am trying to run HTMLUnit in my Android app with version: 3.3.0, but in activity class, its not recognizing WebClint, can someone please tell how to solve this?
Here's the simple code which i am trying to run:
private void scrapeWebsite() {
String targetUrl = "https://example.com"; // Replace with the URL of the website you want to scrape
try (WebClient webClient = new WebClient()) {
// Step 1: Enable JavaScript for the WebClient (Important for handling JavaScript challenges)
webClient.getOptions().setJavaScriptEnabled(true);
// Step 2: Use HTMLUnit to bypass JavaScript challenges and get the webpage content
HtmlPage page = webClient.getPage(targetUrl);
// Step 3: Get the page content as text
String pageContent = page.asText();
// Display the scraped data in the TextView
scrapedDataTextView.setText(pageContent);
} catch (IOException e) {
e.printStackTrace();
scrapedDataTextView.setText("Error occurred while scraping");
}
}
also want to know that does HTMLUnit is good approach to scrap website in android app and does even it work or not?
There is a special version of HtmlUnit for Android because there are some problems with the android jdk.
See https://github.com/HtmlUnit/htmlunit-android
implementation group: 'org.htmlunit', name: 'htmlunit3-android', version: '3.3.0'
And please make sure you are using the latest version of htmlunit3-android; this implies you have to use
page.asNormalizedText();
instead of page.asText().
(running a sample similar to your one is part of the release testing - so i'm really sure the latest version works for this on android).
If you still facing errors please open an issue and provide the url you like to scape to give me a chance to reproduce.