rubyxml-parsingnokogirilibxml-ruby

XML parsing using Ruby for provided URL


I am trying to use Nokogiri to parse my XML which is I am getting from an URL, but I am not able to create an array of it so that it would be accessible all over the project.

My XML:

<component name="Hero">
    <topic name="i1">
      <subtopic name="">
          <links>
            <link Dur="" Id="" type="article">
                <label>I am here First. </label>

    <topic name="i2">
      <subtopic name="">
          <links>
            <link Dur="" Id="" type="article">
                <label>I am here Fourth. </label>
                <label>I am here Sixth. </label>
    <topic name="i3">
      <subtopic name="">
          <links>
            <link Dur="" Id="" type="article">
                <label>I am here Fourth. </label>

I am planning to create an array for each topic, which would contains labels inside it. For example:

hro_array = ["I am here First.","I am here Second.","I am here Third".]


Solution

  • Assuming your XML is well formed and valid (proper closing of nested tags, etc.) then you simply need to fetch the contents of the URL (e.g. using the builtin open-uri) and then use an XML parsing technique (e.g. XPath) to retrieve the desired data.

    For example, assuming you want a hash of topic name to a list of nested labels:

    require 'open-uri'
    require 'nokogiri'
    
    def topic_label_hash(doc)
      doc.xpath('//topic').each_with_object({}) do |topic, hash|
        labels = topic.xpath('.//label/text()').map(&:to_s)
        name = topic.attr('name')
        hash[name] = labels
      end
    end
    
    xml = open(my_url)
    doc = Nokogiri::XML(xml)
    topic_label_hash(doc) # =>
    # {
    #   "TV" => [
    #     "I am here First. ",
    #     "I am here Second. ",
    #     "I am here Third. ",
    #     ...
    #   ],
    #   "Internet" => [
    #     "I am here Fourth. ",
    #     "I am here Fifth. ",
    #     "I am here Sixth. "
    #   ],
    #   ...
    # }