I am learning to scrape data and using the website https://quotes.toscrape.com/ as a training dataset. When I try to collect the about section links, I get this error: Error in html_attr(html_elements(page, ".a"), href = "/author/Albert-Einstein") : unused argument (href = "/author/Albert-Einstein")
.
I have been using selector gadget to identify the CSS and plug it into the code block. I check with inspect element when I can't seem to get it to work. Can anyone help?
Here's my code so far:
#Libraries
library(rvest)
library(dplyr)
#Page grab
link <- "https://quotes.toscrape.com/"
page <- read_html(link)
#Variables
name <- page |>
html_elements(".text") |>
html_text2()
author <- page |>
html_elements(".author") |>
html_text2()
about <- page |>
html_elements(".a") |>
html_attr(".href="/author/Albert-Einstein"")
# html_text2()
list(about)
I've tried to put quotes only around only the author section, but it also tells me there's an unused argument. When I try to remove all quotations marks, it tells me there's an unexpected "/" and when I remove it I get the same unused argument issue.
CSS selector ".a"
would match all elements of class "a"
, e.g:
<div class = "a">foobar</div>
In your example you seem to be after a
elements with specific href
attribute value, e.g.
<a href="/author/Albert-Einstein">(about)</a>
For this you'd use "a[href = '/author/Albert-Einstein']"
.
You can then pass that set of elements to html_attr()
to extract attribute value by its name:
page |>
html_elements("a[href = '/author/Albert-Einstein']") |>
html_attr("href")
Though.. this is a bit questionable strategy as you already know the value and have used it in your code.
You could aproach this by using substring match instead of exact match to get links where href starts with "/author/"
:
page |>
html_elements("a[href ^= '/author/']") |>
html_attr("href")
#> [1] "/author/Albert-Einstein" "/author/J-K-Rowling"
#> [3] "/author/Albert-Einstein" "/author/Jane-Austen"
#> [5] "/author/Marilyn-Monroe" "/author/Albert-Einstein"
#> [7] "/author/Andre-Gide" "/author/Thomas-A-Edison"
#> [9] "/author/Eleanor-Roosevelt" "/author/Steve-Martin"
To learn more about attibute slectors - https://developer.mozilla.org/en-US/docs/Learn_web_development/Core/Styling_basics/Attribute_selectors
Presumably those vectors of yours are going to be used as data.frame columns, so they must be of equal lenght and also align. But what if there are few authors without about page, i.e. no link after the mame? While collecting target elements from whole html document (your page
object) often works well enough, it could easly lead to cases where frame creation fails because input vector lenghts are different.
Here's one slightly different but more robust approach to collect quotes, authors and about links.
First we'll collect container elements, each will form a row in resulting frame; then we'll use that node set instead of the whole html page to extract individual items. If some elememts are missing in some containers, resulting vectors will have NAs at those locations and vector lenghts will be equal, making it safe to use those as frame columns.
Note the use of plural html_elements()
and singular html_element()
, latter makes sure that output lenght is the same as input lenght.
library(rvest)
link <- "https://quotes.toscrape.com/"
page <- read_html(link)
# collect all container elements
quote_divs <- html_elements(page, css = "div.quote")
# extract data from each container element,
# object from html_element() (singular) is always the same lenght as its input,
# so resulting vectors are safe to use for building frames with data.frame() / tibble()
tibble::tibble(
text = html_element(quote_divs, css = "span.text") |> html_text2(),
# tibble::char() just for formatting output
name = html_element(quote_divs, css = "small.author") |> html_text2() |> tibble::char(min_chars = 15),
# subsequent-sibling combinator `~`,
# select <a> that follows <small class="author">
about = html_element(quote_divs, css = "small.author ~ a") |> html_attr("href") |> tibble::char(min_chars = 20),
)
#> # A tibble: 10 × 3
#> text name about
#> <chr> <char> <char>
#> 1 “The world as we have created it is a p… Albert Einstein /author/Albert-Eins…
#> 2 “It is our choices, Harry, that show wh… J.K. Rowling /author/J-K-Rowling
#> 3 “There are only two ways to live your l… Albert Einstein /author/Albert-Eins…
#> 4 “The person, be it gentleman or lady, w… Jane Austen /author/Jane-Austen
#> 5 “Imperfection is beauty, madness is gen… Marilyn Monroe /author/Marilyn-Mon…
#> 6 “Try not to become a man of success. Ra… Albert Einstein /author/Albert-Eins…
#> 7 “It is better to be hated for what you … André Gide /author/Andre-Gide
#> 8 “I have not failed. I've just found 10,… Thomas A. Edis… /author/Thomas-A-Ed…
#> 9 “A woman is like a tea bag; you never k… Eleanor Roosev… /author/Eleanor-Roo…
#> 10 “A day without sunshine is like, you kn… Steve Martin /author/Steve-Martin
Created on 2025-07-25 with reprex v2.1.1