How to create a list of the items under Pick a Category (US) from amzscout rendered differently through HtmlUnitDriver and HtmlUnit headless browser?
Using GeckoDriver / Firefox and ChromeDriver / Chrome combination, I am able to create the list and print as follows:
Code trial:
System.setProperty("webdriver.gecko.driver", "C:/Utility/BrowserDrivers/geckodriver.exe");
WebDriver driver = new FirefoxDriver();
driver.get("https://amzscout.net/sales-estimator");
List<WebElement> elements = new WebDriverWait(driver, 10).until(ExpectedConditions.visibilityOfAllElementsLocatedBy(By.cssSelector("span.cat-pick_name-in")));
for (WebElement ele:elements)
System.out.println(ele.getAttribute("innerHTML"));
driver.quit();
Console Output:
Appliances
Arts, Crafts & Sewing
Automotive
.
.
.
But, using HtmlUnitDriver and HtmlUnit headless browser it seems the HTML renders differently as follows:
The full html is in pastebin
The relevant part of the HTML is:
<script type="application/ld+json">
//<![CDATA[
{
"@context": "http://schema.org/",
"@type": "Product",
"name": "AMZScout Sales Estimator",
"image": "",
"brand": "AMZScout",
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.7",
"bestRating": "5",
"worstRating": "1",
"ratingCount": "231"
}
}
//]]>
</script>
<script type="text/javascript" src="/js/common.js">
</script>
<script type="text/javascript">
//<![CDATA[
const DATA = {
COM: [
["Appliances", "s-cat-icon-appliances"],
["Arts, Crafts & Sewing", "s-cat-icon-craft"],
["Automotive", "s-cat-icon-automotive"],
["Baby", "s-cat-icon-baby"],
["Beauty & Personal Care", "s-cat-icon-beauty"],
["Books", "s-cat-icon-books"],
["Camera & Photo", "s-cat-icon-camera"],
["Cell Phones & Accessories", "s-cat-icon-phone"],
["Clothing, Shoes & Jewelry", "s-cat-icon-clothing"],
["Computers & Accessories", "s-cat-icon-computers"],
["Electronics", "s-cat-icon-electronics"],
["Grocery & Gourmet Food", "s-cat-icon-food"],
["Health & Household", "s-cat-icon-health"],
["Home and Garden", "s-cat-icon-home"],
["Home & Kitchen", "s-cat-icon-kitchen"],
["Industrial & Scientific", "s-cat-icon-gear"],
["Jewelry", "s-cat-icon-jewelry"],
["Kindle Store", "s-cat-icon-kindle"],
["Kitchen & Dining", "s-cat-icon-dining"],
["Musical Instruments", "s-cat-icon-musical-instruments"],
["Office Products", "s-cat-icon-office"],
["Patio, Lawn & Garden", "s-cat-icon-lawn"],
["Pet Supplies", "s-cat-icon-pet-food"],
["Shoes", "s-cat-icon-shoes"],
["Software", "s-cat-icon-software"],
["Sports & Outdoors", "s-cat-icon-sports"],
["Tools & Home Improvement", "s-cat-icon-repairs"],
["Toys & Games", "s-cat-icon-toys"],
["Watches", "s-cat-icon-watches"],
["Video Games", "s-cat-icon-joystick"]
],
CO_UK: [
["Baby", "s-cat-icon-baby"],
Which is referenced within:
$(function () { var rankInput = $('.cat-rank_input'); function toggleRank(e) { var cats = $('.cat-pick'); var rank = $('.cat-rank'); var list = rank.find('.cat-pick_list'); var $el = $(e.currentTarget).clone(); $el.on('click', toggleRank).css('cursor',
'pointer'); list.empty(); list.append($el); category = $el.find('.cat-pick_name-in').text(); rankInput.val(''); cats.toggle(); rank.toggle(); if ($(window).width() >= 768) { var catsHeight = cats.height(); rank.height(catsHeight); } if (rank.is(':visible'))
{ val.text('?'); setTimeout(function () {rankInput.focus()}, 0); } } function selectDomain(d) { const data = DATA[d]; const list = $('.cat-pick .cat-pick_list'); list.empty(); data.filter(function (d) {return d[1] != ''}).forEach(function (d) { var el
= $('
<div class="cat-pick_i"><span class="cat-pick_link"><span class="cat-pick_ico"><span></span></span><span class="cat-pick_name"><span class="cat-pick_name-in"></span></span>
</span>
</div>'); el.find('.cat-pick_ico span').addClass(d[1]); el.find('.cat-pick_name-in').text(d[0]); el.on('click', toggleRank); list.append(el); }); domain = d; } rankInput.on('change', function () {rank = rankInput.val()}); rankInput.on('keyup', function(e) {e.keyCode
== 13 && (rank = rankInput.val()) && getEstSales()}); $('.cat-rank_another-link').on('click', toggleRank); $('#domain').on('change', function (e) {selectDomain(e.target.value);}); selectDomain(domain); });
Can anyone help me out please?
As you already figured out, the items you are looking for are created by javascript. This implies that you have to enable the javascript support for HtmlUnit.
The second point is to wait in some way until the javascript is finished. You are using 'visibilityOfAllElementsLocatedBy' and the documentation states for this:
An expectation for checking that all elements present on the web page that match the locator are visible.
If there are no elements (or not all elements because javascript is still creating new ones) matching your selector, this is true. Because of this i have changed the wait condition a bit to really wait until the elements are created.
My final source looks like this and creates exactly the list you are expecting:
String url = "https://amzscout.net/sales-estimator";
// true enables javascript support
WebDriver driver = new HtmlUnitDriver(true);
try {
driver.get(url);
// wait until the elements are created
List<WebElement> elements =
new WebDriverWait(driver, 10)
.until(ExpectedConditions
.numberOfElementsToBeMoreThan(
By.cssSelector("span.cat-pick_name-in"), 29));
System.out.println();
for (WebElement ele : elements) {
System.out.println(ele.getAttribute("innerHTML"));
}
} finally {
driver.quit();
}
Hope that helps....