rvest

Web scraping using Rvest is not working as expected


I have been searching and reading number of rvest related articles on how to web scrape. Unfortunately, my attempts to learn various components involved has been unsuccessful. I understand some HTML syntax and am fluent in r but this is my first time working with rvest.

This is the source (inspect view). I'm trying to extract various pieces of information. I started with a test case on " title".

enter image description here

Here is my code so far...

x <- https://www.govtrack.us/congress/bills/browse?congress=118

read_html(x) %>% html_elements(xpath = "//*[@class = 'container']") %>% 
html_elements(xpath = ".//*[@class = 'searching row']") %>% 
html_elements(xpath = ".//*[@class = 'col-sm-8']")  %>% 
html_elements(xpath = ".//*[@class = 'results']") %>% 
html_text()

> "\n    "

As you can see, in the results class, I'm getting blank. What can I change to make this code work?

Thank you.


Solution

  • from @joshallen The scrapper in the repo is used by the govtrack people to get the info for the website. The issue you are running into is that the website is using javascript to display its contents. read_html can't execute the javascript. If you only care about the info on one page use read_html_live. If you need to scrape the entire website you are probably going to need to use selenium to scroll the webpage see this answer for how to do it – Josh Allen Commented13 hours ago