Navigate here and search for "Download All Holdings" and you will arrive at the link to the file that I want to scrape (headers and cells contents).
Using open-uri or Roo returns the page source when I pass in the table link (not the table contents).
Using Ruby, how can I read the contents of this file? I would ideally like to extract the contents and save the original file in read-only format.
Note: I am already using Mechanize/Nokogiri to scrape and want to supplement/validate my scraping with linked Excel files like the one above.
Just make sure to use Roo::Spreadsheet
and not Roo::Excelx
, because only Roo::Spreadsheet
can open remote URLs directly:
url = 'https://www.spdrs.com/site-content/xls/TOTL_All_Holdings.xls?fund=TOTL&docname=All+Holdings&onyx_code1=1286&onyx_code2='
sheet = Roo::Spreadsheet.open(url)