I want to scrape links to ads on website https://www.supralift.com/uk/itemsearch/results which uses a JavaScript pager. My intention is to collect the links on the page, then click the "Next" button on pager which would evoke loading the next page on the website, collect the links again and so on.
Below is the code I created but it does not work, I am not able to invoke the pager to load the next page.
How to use the read_html_live()
function correctly please?
Many thanks in advance for any advices.
library(tidyverse)
library(rvest)
ad_links_all <- tibble() # We will collect here all ad links
n_of_pages <- 10 # For testing purposes we want to scrape just this number of first pages
page <- read_html_live("https://www.supralift.com/uk/itemsearch/results")
for (i in 1:n_of_pages) {
# Scrape the links
page_ad_links <- page %>% html_elements(".product-info .h-full a:has(.line-clamp-2)") %>%
html_attr("href") %>% str_split(., "\\?searchKey=", n = 2) %>% lapply(., `[[`, 1) %>% unlist() %>% tibble()
# Collect all links here
ad_links_all <- ad_links_all %>% bind_rows(page_ad_links)
# Move to the next page
page$click(".pagination-direction-btn:nth-child(1)")
Sys.sleep(.5)
}
#> Error in `private$wait_for_selector()`:
#> ! Failed to find selector ".pagination-direction-btn:nth-child(1)" in 5
#> seconds.
<sup>Created on 2024-09-30 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>
Maybe some frontend dev can chime in, but it seems that you need to scroll down to the element first to get it rendered, perhaps some AngularJS thing.
Following css selectors turned out bit crude, but WorksOnMyMachine(tm) .
And strangely, most LiveHTML
methods do not accept xpath
.
library(rvest)
library(tibble)
library(stringr)
n_of_pages <- 2
page <- read_html_live("https://www.supralift.com/uk/itemsearch/results")
# list for links
ad_links_all <- vector(mode = "list", length = n_of_pages)
for (i in seq(n_of_pages)){
# scroll down to update/render elements,
# as a side effect it also waits for 10th item-preview to appear
# (though apparently not something to always rely on)
page$scroll_into_view("mfe-application-item-preview:nth-of-type(10)")
Sys.sleep(5)
# report current page
html_elements(page, xpath = "//mfe-application-navigation/div/div/div/span") |>
html_text() |>
paste(collapse = " ") |>
message()
ad_links_all[[i]] <-
page |>
html_elements(".product-info .h-full a:has(.line-clamp-2)") |>
html_attr("href") |>
str_split_i("\\?searchKey=", 1)
page$click("mfe-application-navigation > div > div > div:nth-child(3) > button:nth-child(1)")
}
#> 1 From 2491
#> 2 From 2491
Results:
ad_links_all |>
lapply(\(lnks) tibble(href = lnks)) |>
bind_rows(.id = "page")
#> # A tibble: 20 × 2
#> page href
#> <chr> <chr>
#> 1 1 https://www.supralift.com/uk/used-forklifts/combilift-combilift-wfc-ot…
#> 2 1 https://www.supralift.com/uk/used-forklifts/jungheinrich-efg-216k-464d…
#> 3 1 https://www.supralift.com/uk/used-forklifts/linde-e-12-evo-386-02-3%20…
#> 4 1 https://www.supralift.com/uk/used-forklifts/still-rx60-40-batt.neu-4%2…
#> 5 1 https://www.supralift.com/uk/used-forklifts/still-fm-x14-reach%20truck…
#> 6 1 https://www.supralift.com/uk/used-forklifts/still-fm-x17n-reach%20truc…
#> 7 1 https://www.supralift.com/uk/used-forklifts/still-cop-l07-order%20pick…
#> 8 1 https://www.supralift.com/uk/used-forklifts/still-rx20-20p-4%20wheel%2…
#> 9 1 https://www.supralift.com/uk/used-forklifts/still-rx20-20pl-4%20wheel%…
#> 10 1 https://www.supralift.com/uk/used-forklifts/still-rx60-35-4%20wheel%20…
#> 11 2 https://www.supralift.com/uk/used-forklifts/still-exv-sf14-pallet%20st…
#> 12 2 https://www.supralift.com/uk/used-forklifts/still-rx60-50-batt.neu-4%2…
#> 13 2 https://www.supralift.com/uk/used-forklifts/still-rx60-30-4%20wheel%20…
#> 14 2 https://www.supralift.com/uk/used-forklifts/still-rx60-35-4%20wheel%20…
#> 15 2 https://www.supralift.com/uk/used-forklifts/still-exv-sf14-pallet%20st…
#> 16 2 https://www.supralift.com/uk/used-forklifts/still-fm-x17n-reach%20truc…
#> 17 2 https://www.supralift.com/uk/used-forklifts/still-rx20-16c-3%20wheel%2…
#> 18 2 https://www.supralift.com/uk/used-forklifts/still-rx20-15-3%20wheel%20…
#> 19 2 https://www.supralift.com/uk/used-forklifts/still-cop-l07-order%20pick…
#> 20 2 https://www.supralift.com/uk/used-forklifts/still-rx70-30t-4%20wheel%2…