I am trying to scrape the table from 'https://www.metabolomicsworkbench.org/data/mb_structure_ajax.php'.
The code I found online (rvest) did not work
library(rvest)
url <- "https://www.metabolomicsworkbench.org/data/mb_structure_ajax.php"
A <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="containerx"]/div[1]/table') %>%
html_table()
A is 'list of 0'
How should I fix this code or is there any better way to do it?
Thanks in advance.
The page source is generated by JS. Here is what you do:
https://www.metabolomicsworkbench.org/data/mb_structure_tableonly.php
and get the content of it.
Here are the parameters:
rvest
. Here is the code to scrape all pages:library(rvest)
url <- "https://www.metabolomicsworkbench.org/data/mb_structure_tableonly.php"
pg <- html_session(url)
data <-
purrr::map_dfr(
1:4288, # you might wanna change it to a small number to try first or scrape multiple times and combine data frames later, in case something happens in the middle
function(i) {
pg <- rvest:::request_POST(pg,
url,
body = list(
page = i
))
read_html(pg) %>%
html_node("table") %>%
html_table()
}
)