rweb-scrapingtidyrrvest

Scraping Amazon review issue in R


For learning purposes, I am trying to scrape an Amazon product page, more specifically I need to study the review section.

The code I am trying now os:

if(!"pacman" %in% installed.packages()[,"Package"]) install.packages("pacman")
pacman::p_load(rvest, dplyr, tidyr, stringr)

# Airpods
asin_code <- "B09JQMJHXY"

url <- paste0("https://www.amazon.com/dp/", asin_code)
doc <- read_html(url)

#obtain the text in the node, remove "\n" from the text, and remove white space
prod <- html_nodes(doc, "#productTitle") %>% 
  html_text() %>% 
  gsub("\n", "", .) %>% 
  trimws()

prod

This scrapes only the title, but the logic remains the same even after.

So the problem is the fact that when I tried first, I got the right output (which is the title).

After running the same code 15 min later, I get character(0).

Does Amazon have blocked my IP, or similar?

My R version is 4.3.2


Solution

  • library(rvest)
    
    page <- "https://www.amazon.com/dp/B09JQMJHXY" |> 
      read_html()
    
    page |> 
      html_elements(".a-section.review.cr-desktop-review-page-0") |> 
      map_dfr(~ tibble(
        title = html_element(.x, ".cr-original-review-content") |> 
          html_text2(),
        rating = html_element(.x, ".a-icon-alt") |> 
          html_text2(), 
        review = html_element(.x, ".reviewText") |>  
          html_text2(),
        date = html_element(.x, ".review-date") |> 
          html_text2()
      ))
    
    # A tibble: 5 × 4
      title                                                       rating review date 
      <chr>                                                       <chr>  <chr>  <chr>
    1 AirPod Pro                                                  5.0 o… Excel… Revi…
    2 10/10 llegó como tenía que llegar                           5.0 o… Muy b… Revi…
    3 Buenos audifonos.                                           5.0 o… Muy b… Revi…
    4 Son buenos, pero me salieron con un error, pero Apple con … 4.0 o… Este … Revi…
    5 Cómodos y súper fáciles de usar                             5.0 o… Son l… Revi…