I am trying to use Nokogiri to parse my XML which is I am getting from an URL, but I am not able to create an array of it so that it would be accessible all over the project.
My XML:
<component name="Hero">
<topic name="i1">
<subtopic name="">
<links>
<link Dur="" Id="" type="article">
<label>I am here First. </label>
<topic name="i2">
<subtopic name="">
<links>
<link Dur="" Id="" type="article">
<label>I am here Fourth. </label>
<label>I am here Sixth. </label>
<topic name="i3">
<subtopic name="">
<links>
<link Dur="" Id="" type="article">
<label>I am here Fourth. </label>
I am planning to create an array for each topic, which would contains labels inside it. For example:
hro_array = ["I am here First.","I am here Second.","I am here Third".]
Assuming your XML is well formed and valid (proper closing of nested tags, etc.) then you simply need to fetch the contents of the URL (e.g. using the builtin open-uri
) and then use an XML parsing technique (e.g. XPath) to retrieve the desired data.
For example, assuming you want a hash of topic name to a list of nested labels:
require 'open-uri'
require 'nokogiri'
def topic_label_hash(doc)
doc.xpath('//topic').each_with_object({}) do |topic, hash|
labels = topic.xpath('.//label/text()').map(&:to_s)
name = topic.attr('name')
hash[name] = labels
end
end
xml = open(my_url)
doc = Nokogiri::XML(xml)
topic_label_hash(doc) # =>
# {
# "TV" => [
# "I am here First. ",
# "I am here Second. ",
# "I am here Third. ",
# ...
# ],
# "Internet" => [
# "I am here Fourth. ",
# "I am here Fifth. ",
# "I am here Sixth. "
# ],
# ...
# }