rubyxmlnokogirihpricot

XML to hash table in Ruby: Parsing list of historical inventions


I'd like to slurp the following data about historical inventions into a convenient Ruby data structure:

http://yootles.com/outbox/inventions.xml

Note that all the data is in the XML attributes.

It seems like there should be a quick solution with a couple lines of code. With Rails there'd be Hash.from_xml though I'm not sure that would handle the attributes properly. In any case, I need this as a standalone Ruby script. Nokogiri seems overly complicated for this simple task based on this code that someone posted for a similar problem: http://gist.github.com/335286. I found a purportedly simple solution using hpricot but it doesn't seem to handle the XML attributes. Maybe that's a simple extension? Finally there's ROXML but that looks even more heavyweight than nokogiri.

To make the question concrete (and with obvious ulterior motives), let's say that an answer should be a complete Ruby script that slurps the XML from the above URL and spits out CSV like this:

id, invention, year, inventor, country
RslCn, "aerosol can", 1926, "Erik Rotheim", "Norway"
RCndtnng, "air conditioning", 1902, "Willis Haviland Carrier", "US"
RbgTmtv, "airbag, automotive", 1952, "John Hetrick", "US"
RplnNgnpwrd, "airplane, engine-powered", 1903, "Wilbur and Orville Wright", "US"

I'll work on my own answer and post it too unless someone beats me to the punch with something clearly superior. Thanks!


Solution

  • Using REXML and open-uri:

    require "rexml/document"
    require "open-uri"
    
    doc = REXML::Document.new open( "http://yootles.com/outbox/inventions.xml" ).read
    
    puts [ 'id', 'invention', 'year', 'inventor', 'country' ].join ','
    doc.root.elements.each do |invention|
      inventor = invention.elements.first
      data = []
      data << invention.attributes['id']
      data << '"' + invention.attributes['name'] + '"'
      data << invention.attributes['year']
      data << '"' + inventor.attributes['name'] + '"'
      data << '"' + inventor.attributes['country'] + '"'
      puts data.join ','
    end