rweb-scrapingrvest

How to use read_html_live() to navigate through a javascript pager?


I want to scrape links to ads on website https://www.supralift.com/uk/itemsearch/results which uses a JavaScript pager. My intention is to collect the links on the page, then click the "Next" button on pager which would evoke loading the next page on the website, collect the links again and so on. Below is the code I created but it does not work, I am not able to invoke the pager to load the next page. How to use the read_html_live() function correctly please? Many thanks in advance for any advices.

  library(tidyverse)
  library(rvest)
  
  ad_links_all <- tibble() # We will collect here all ad links
  n_of_pages <- 10 # For testing purposes we want to scrape just this number of first pages
  
  page <- read_html_live("https://www.supralift.com/uk/itemsearch/results")
  
  for (i in 1:n_of_pages) {
    
    # Scrape the links
    page_ad_links <- page %>% html_elements(".product-info .h-full a:has(.line-clamp-2)") %>% 
      html_attr("href") %>% str_split(., "\\?searchKey=", n = 2) %>% lapply(., `[[`, 1) %>% unlist() %>% tibble()
    
    # Collect all links here
    ad_links_all <- ad_links_all %>% bind_rows(page_ad_links)
    
    # Move to the next page
    page$click(".pagination-direction-btn:nth-child(1)")
    
    Sys.sleep(.5)
    
  }
#> Error in `private$wait_for_selector()`:
#> ! Failed to find selector ".pagination-direction-btn:nth-child(1)" in 5
#>   seconds.


<sup>Created on 2024-09-30 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>
 


Solution

  • Maybe some frontend dev can chime in, but it seems that you need to scroll down to the element first to get it rendered, perhaps some AngularJS thing.

    Following css selectors turned out bit crude, but WorksOnMyMachine(tm) .
    And strangely, most LiveHTML methods do not accept xpath.

    library(rvest)
    library(tibble)
    library(stringr)
    
    n_of_pages <- 2
    page <- read_html_live("https://www.supralift.com/uk/itemsearch/results")
    
    # list for links
    ad_links_all <- vector(mode = "list", length = n_of_pages)
    
    for (i in seq(n_of_pages)){
      # scroll down to update/render elements, 
      # as a side effect it also waits for 10th item-preview to appear 
      # (though apparently not something to always rely on)
      page$scroll_into_view("mfe-application-item-preview:nth-of-type(10)")
      Sys.sleep(5)
      
      # report current page
      html_elements(page, xpath = "//mfe-application-navigation/div/div/div/span") |> 
        html_text() |> 
        paste(collapse = " ") |> 
        message()
      
      ad_links_all[[i]] <-
        page |> 
        html_elements(".product-info .h-full a:has(.line-clamp-2)") |> 
        html_attr("href") |> 
        str_split_i("\\?searchKey=", 1)
      
      page$click("mfe-application-navigation > div > div > div:nth-child(3) > button:nth-child(1)")
    }
    #> 1 From 2491
    #> 2 From 2491
    

    Results:

    ad_links_all |> 
      lapply(\(lnks) tibble(href = lnks)) |> 
      bind_rows(.id = "page")
    #> # A tibble: 20 × 2
    #>    page  href                                                                   
    #>    <chr> <chr>                                                                  
    #>  1 1     https://www.supralift.com/uk/used-forklifts/combilift-combilift-wfc-ot…
    #>  2 1     https://www.supralift.com/uk/used-forklifts/jungheinrich-efg-216k-464d…
    #>  3 1     https://www.supralift.com/uk/used-forklifts/linde-e-12-evo-386-02-3%20…
    #>  4 1     https://www.supralift.com/uk/used-forklifts/still-rx60-40-batt.neu-4%2…
    #>  5 1     https://www.supralift.com/uk/used-forklifts/still-fm-x14-reach%20truck…
    #>  6 1     https://www.supralift.com/uk/used-forklifts/still-fm-x17n-reach%20truc…
    #>  7 1     https://www.supralift.com/uk/used-forklifts/still-cop-l07-order%20pick…
    #>  8 1     https://www.supralift.com/uk/used-forklifts/still-rx20-20p-4%20wheel%2…
    #>  9 1     https://www.supralift.com/uk/used-forklifts/still-rx20-20pl-4%20wheel%…
    #> 10 1     https://www.supralift.com/uk/used-forklifts/still-rx60-35-4%20wheel%20…
    #> 11 2     https://www.supralift.com/uk/used-forklifts/still-exv-sf14-pallet%20st…
    #> 12 2     https://www.supralift.com/uk/used-forklifts/still-rx60-50-batt.neu-4%2…
    #> 13 2     https://www.supralift.com/uk/used-forklifts/still-rx60-30-4%20wheel%20…
    #> 14 2     https://www.supralift.com/uk/used-forklifts/still-rx60-35-4%20wheel%20…
    #> 15 2     https://www.supralift.com/uk/used-forklifts/still-exv-sf14-pallet%20st…
    #> 16 2     https://www.supralift.com/uk/used-forklifts/still-fm-x17n-reach%20truc…
    #> 17 2     https://www.supralift.com/uk/used-forklifts/still-rx20-16c-3%20wheel%2…
    #> 18 2     https://www.supralift.com/uk/used-forklifts/still-rx20-15-3%20wheel%20…
    #> 19 2     https://www.supralift.com/uk/used-forklifts/still-cop-l07-order%20pick…
    #> 20 2     https://www.supralift.com/uk/used-forklifts/still-rx70-30t-4%20wheel%2…