rweb-scrapinghttp-status-code-404handleerror

skip error Error in open.connection(x, "rb") : HTTP error 404


Hello, Im new to this fascinating world of r, I have not been able to skip the urls that do not exist, how can I handle it? and don't mark as and error, thanks for your help.


title: "error" author: "FJSG" date: "27/6/2020" output: html_document

knitr::opts_chunk$set(echo = TRUE)

library(xml2)
library(rvest)
library(tidyverse)
library(lubridate)

zora_core <- read_html("https://zora.medium.com/the-zora-music-canon-5a29296c6112")

Los_100 <- data.frame(album      = html_nodes(zora_core, "h1:not(#96c9)") %>% 
                                     html_text() %>% 
                                     str_trim(side = "both"),
                      interprete = html_nodes(zora_core, "strong em , p#73e0 strong") %>% 
                                     html_text() %>% 
                                     str_remove_all("^by") %>%
                                     str_extract("[a-zA-Z].+(?=[(])") %>% str_trim(side = "both"),
                      año        = html_nodes(zora_core, "strong em , p#73e0 strong") %>% 
                                     html_text %>% 
                                     str_extract("([[:digit:]]){4}"),
                      liga       = paste0("https://en.wikipedia.org/wiki/",html_nodes(zora_core,                                       "strong em , p#73e0 strong") %>% 
                                     html_text() %>%
                                     str_remove_all("^by") %>%
                                     str_extract("[a-zA-Z].+(?=[(])") %>% str_trim(side = "both") %>% str_replace_all(" ","_")))

carga <- function(url){
  
         perfil_raw <- read_html(url)
         data.frame(interprete = html_node(perfil_raw, "h1#firstHeading") %>% 
                                 html_text() %>% str_trim(side = "both"))
         
}
lista <- Los_100$liga[1:16] # THE url for the position 16 don´t exist how to avoid that

datos_personales <- map_df(lista,carga)



Solution

  • It's useful to learn about error-handling in R, but when working with http requests it becomes essential.

    In your case, it is best to wrap carga in a tryCatch. This runs an expression that you pass as the first argument and if an error is thrown, it is caught and passed to the second argument of tryCatch, which is a function.

    If an error is thrown we need to return a data frame with a single column called interprete so that map_df can bind it together with the other results:

    carga_catch <- function(x)
    {
      tryCatch(return(carga(x)),
               error = function(e) return(data.frame(interprete = "**inexistente**")))
    }
    
    map_df(lista, carga_catch)
    #>               interprete
    #> 1        Ella Fitzgerald
    #> 2          Sarah Vaughan
    #> 3         Billie Holiday
    #> 4  Sister Rosetta Tharpe
    #> 5             Lena Horne
    #> 6        Mahalia Jackson
    #> 7          Abbey Lincoln
    #> 8             Etta James
    #> 9         Leontyne Price
    #> 10       Marian Anderson
    #> 11      Dinah Washington
    #> 12                Odetta
    #> 13        Dionne Warwick
    #> 14          The Supremes
    #> 15           Nina Simone
    #> 16       **inexistente**
    
    

    Apart from error handling, I think your code is very good for someone just beginning in R. It achieves a lot in a few lines of code and is perfectly readable. Good work!