I am trying to extract the table on this page https://clinicaltrials.gov/study/NCT05817110?tab=history
using the XPath copied via the Chrome browser
I have tried using this code, but it does not work. I occasionally engage in web scraping and have a basic understanding of HTML. Thank you in advance for any assistance with this.
# URL of the webpage
url <- "https://clinicaltrials.gov/study/NCT05817110?tab=history"
# Fetch the webpage
webpage <- read_html(url)
# Extract the table using the XPath
table_data <- webpage %>%
html_nodes(xpath = '/*[@id="study-record-versions-table"]/ctg-card/div/div[2]/ctg-history-changes-table/table/tbody') %>%
html_table(fill = TRUE)
It looks like the page uses javascript to load the page. There are a couple of possible solutions. Use read_html_live()
or access the data directly at the api link: "https://clinicaltrials.gov/api/int/studies/NCT05817110?history=true" (found using the network tab of the browser's developer tools)
study <- jsonlite::fromJSON("https://clinicaltrials.gov/api/int/studies/NCT05817110?history=true")
study$history$changes