
Multiple scraping with R

Trying to scrape many pdfs using R. I've found multiple examples on how to do this (here's one; here's another), but I can't find a way to do it. I want to download files from the following main site and within a particular year, for example, 2018

I need the pdf for the Beige book, Tealbook A and the statement.

I've attempted this in many ways. First try was to modify the first link


url <- ""

page <- read_html(url)

urls_pdf <- page %>% 
  html_elements("a") %>% 
  html_attr("href") %>% 

urls_pdf[1:3] %>% walk2(basename(.), download.file, mode = "wb")

dir(pattern = "\\.pdf")

but I get nothing.

Second I tried to loop, figuring out a pattern for some tealbook A dates

# Initialize list to store links for tealbook A reports
tealA <- list()

# Generate links for tealbook A reports
for (i in seq_along(fomc_dates)) {
  this_fomc <- fomc_dates[i]
  this_teal_A <- this_fomc - days(12)
  link <- paste0("", format(this_fomc, "%Y%m%d"), "tealbooka", format(this_teal_A, "%Y%m%d"), ".pdf")
  tealA[[i]] <- link

The problem is that this pattern is not followed by all links, so it only works for some. Any ideas on how to do this on the most automated way possible will be greatly appreciated!


  • Not the most elegant way of doing it, but it gets it done

    generate_links <- function(start_year, end_year) {
      links <- character()
      for (year in start_year:end_year) {
        links <- c(links, paste0("", year, ".htm"))
    # Example: Generate links from 1970 to 2018
    start_year <- 2017
    end_year <- 2018
    url_links <- generate_links(start_year, end_year)
    # Define the URL of the webpage
    for (this_url in 1:no_urls) {
      # Read the HTML content of the webpage
      page <- read_html(current_url)
      # Extract all links from the webpage
      links <- page %>%
        html_elements("a") %>%
      # Filter out links that contain "beige book", "tealbooka", or "statement"
      pdf_links <- grep("(BeigeBook|tealbooka|statement)", links, = TRUE, value = TRUE)
      # Filter out links that point to PDF files
      pdf_links <- grep("\\.pdf$", links, value = TRUE)
      # Function to download PDF files
      download_pdfs <- function(links, output_directory) {
        # Create the output directory if it doesn't exist
        if (!dir.exists(output_directory)) {
          dir.create(output_directory, recursive = TRUE)
        # Loop over each link and download the corresponding PDF file
        for (link in pdf_links) {
          file_name <- paste0(output_directory, "/", basename(link))
          response <- httr::GET(this_link)
          if (httr::status_code(response) == 200) {
            bin_data <- httr::content(response, "raw")
            writeBin(bin_data, file_name)
            cat("Downloaded:", file_name, "\n")
          } else {
            cat("Failed to download:", this_link, "\n")