[SOLVED] How to scrape a large table from a php website using R

How to scrape a large table from a php website using R

I am trying to scrape the table from 'https://www.metabolomicsworkbench.org/data/mb_structure_ajax.php'.

The code I found online (rvest) did not work

library(rvest)
url <- "https://www.metabolomicsworkbench.org/data/mb_structure_ajax.php"
A <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@id="containerx"]/div[1]/table') %>%
  html_table()

A is 'list of 0'

How should I fix this code or is there any better way to do it?

Thanks in advance.

Solution

The page source is generated by JS. Here is what you do:

Open the Dev Tool of the browser and go to the Network tab.
Click on one of the pages and see what's going on (I clicked to page 4). You can see that the page sent a POST request to https://www.metabolomicsworkbench.org/data/mb_structure_tableonly.php and get the content of it. Here are the parameters:
Mimic the POST request by rvest. Here is the code to scrape all pages:

library(rvest)

url <- "https://www.metabolomicsworkbench.org/data/mb_structure_tableonly.php"
pg <- html_session(url)
data <- 
  purrr::map_dfr(
    1:4288, # you might wanna change it to a small number to try first or scrape multiple times and combine data frames later, in case something happens in the middle
    function(i) {
      pg <- rvest:::request_POST(pg,
                                 url,
                                 body = list(
                                   page = i
                                 ))
      read_html(pg) %>%
        html_node("table") %>%
        html_table() 
    }
  )