ruby-on-railsrubyexcelruby-on-rails-4roo-gem

Rails 4: How can I import an Excel file directly from a URL?


Navigate here and search for "Download All Holdings" and you will arrive at the link to the file that I want to scrape (headers and cells contents).

Using open-uri or Roo returns the page source when I pass in the table link (not the table contents).

Using Ruby, how can I read the contents of this file? I would ideally like to extract the contents and save the original file in read-only format.

Note: I am already using Mechanize/Nokogiri to scrape and want to supplement/validate my scraping with linked Excel files like the one above.


Solution

  • Just make sure to use Roo::Spreadsheet and not Roo::Excelx, because only Roo::Spreadsheet can open remote URLs directly:

    url = 'https://www.spdrs.com/site-content/xls/TOTL_All_Holdings.xls?fund=TOTL&docname=All+Holdings&onyx_code1=1286&onyx_code2='
    sheet = Roo::Spreadsheet.open(url)