ruby-on-railsrubyxmlxpathrexml

How to get child node of an XML page using Ruby and REXML


I am using Ruby version 1.9.3. Here is a simple version of the actual XML page that I want to get information from. I need to access it from a secure website which requires login credentials. I can't use Nokogiri because I wasn't able to log into the website using it.

<root>
  <person>
    <name>Jack</name>
    <age>10</age>
  </person>
  <person>
    <name>Jones</name>
  </person>
  <person>
    <name>Jon</name>
    <age>16</age>
  </person>
</root>

As you can see sometimes the tag age does not appear. Using REXML with Ruby, I use the following code:

agent = Mechanize.new
xml = agent.get("https://securewebsite.com/page.xml")
document = REXML::Document.new(xml.body)

name = XPath.match(document, "//person/name").map {|x| x.text} 
# => ["Jack", "Jones", "Jon"]

age =  XPath.match(document, "//person/age").map {|x| x.text} 
# => ["10", "16"]

The problem is that I can't associate the age with the correct name because the index are now out of order. For example at index 1, name[1] is Jones but age[1] is 16. But that is not true because the person tag for Jones does not have the age tag.

Is there any way that I can get the age array to output: # => ["10", nil ,"16"] so that I can associate the correct name with its corresponding age?

Or is there a better way? Let me know if further explanation is required.


Solution

  • The problem is that we are looking at age and name as completely separate collections of information. What we need to do is get information from person as a collection.

    xml = "<your xml here />"
    doc = Nokogiri::XML(xml)
    persons = doc.xpath("//person")
    persons_data = persons.map {|person| 
      {
        name: person.xpath("./name").text,
        age: person.xpath("./age").text
      }
    }
    

    This gets the person nodes and then gets the related information from them giving a result:

    puts persons_data.inspect #=> [
                                    {:name=>"Jack", :age=>"10"}, 
                                    {:name=>"Jones", :age=>""}, 
                                    {:name=>"Jon", :age=>"16"}
                                  ]
    

    So to get the name and age of the first person you would call

    persons_data[0]["name"] #=> "Jack"
    persons_data[0]["age"]  #=> "10"