I seek to scrape ETF summary stats from Yahoo finance. For example, the page link is https://finance.yahoo.com/quote/IVV. Below the graph, is the table to scrape and the key fields are NAV, PE Ratio TTM, yield, beta and expense ratio. I previously used the rvest package as follows, but that is no longer working as the page structure has changed
ticker <- "IVV"
url <- paste0("https://finance.yahoo.com/quote/",ticker)
df <- url %>%
read_html() %>%
html_table() %>%
map_df(bind_cols) %>%
as_tibble()
Any help appreciated
It looks like there is no longer a table element in that link, as the info you're after is now contained in list elements. I have tweaked the code to capture the label and values from each list element.
library(rvest)
library(purrr)
library(dplyr)
ticker <- "IVV"
url <- paste0("https://finance.yahoo.com/quote/",ticker)
ivv_html <- read_html(url)
node_txt <- ".svelte-tx3nkj" # This contains "table" info of interest
df <- ivv_html %>%
html_nodes(paste0(".container", node_txt)) %>%
map_dfr(~{
tibble(
label = html_nodes(.x, paste0(".label", node_txt)) %>%
html_text(trim = TRUE)
,value = html_nodes(.x, paste0(".value", node_txt)) %>%
html_text(trim = TRUE)
)
})
df %>%
filter(label %in% c("NAV", "PE Ratio (TTM)", "Yield", "Beta (5Y Monthly)", "Expense Ratio (net)"))
# A tibble: 5 × 2
label value
<chr> <chr>
1 NAV 519.85
2 PE Ratio (TTM) 26.22
3 Yield 1.37%
4 Beta (5Y Monthly) 1.00
5 Expense Ratio (net) 0.03%
Adding .container
class will limit the info you're after to just the "table" located under the chart, otherwise all info tagged with the class .svelte-tx3nkj
from that page will be extracted.
UPD 2024-08-23, following HTML structure change:
node_txt <- "yf-tx3nkj"
ivv_html %>%
html_nodes(paste0("ul.", node_txt)) %>%
html_nodes(paste0(".", node_txt)) %>%
map(~{
tibble(
label = html_nodes(.x, paste0(".label.", node_txt)) %>%
html_text(trim = TRUE)
,value = html_nodes(.x, paste0(".value.", node_txt)) %>%
html_text(trim = TRUE)
)
}) %>%
list_rbind()
df %>%
filter(label %in% c("NAV", "PE Ratio (TTM)", "Yield", "Beta (5Y Monthly)", "Expense Ratio (net)"))