rweb-scrapingxpathrvest

Webscraping with Rvest - Using the xpath to extract table as data frame


I am trying to extract the table on this page https://clinicaltrials.gov/study/NCT05817110?tab=history using the XPath copied via the Chrome browser

I have tried using this code, but it does not work. I occasionally engage in web scraping and have a basic understanding of HTML. Thank you in advance for any assistance with this.

# URL of the webpage
url <- "https://clinicaltrials.gov/study/NCT05817110?tab=history"

# Fetch the webpage
webpage <- read_html(url)

# Extract the table using the XPath
table_data <- webpage %>%
  html_nodes(xpath = '/*[@id="study-record-versions-table"]/ctg-card/div/div[2]/ctg-history-changes-table/table/tbody') %>%
  html_table(fill = TRUE)

Solution

  • It looks like the page uses javascript to load the page. There are a couple of possible solutions. Use read_html_live() or access the data directly at the api link: "https://clinicaltrials.gov/api/int/studies/NCT05817110?history=true" (found using the network tab of the browser's developer tools)

    study <- jsonlite::fromJSON("https://clinicaltrials.gov/api/int/studies/NCT05817110?history=true")
    study$history$changes