rubyxmlnamespacesnokogirigoogle-shopping

Can't retrieve data in Google namespace from XML document with Nokogiri


I have this Google shopping feed:

<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
<channel>
  <item>
    <title>test</title>
    <g:id>1</g:id>
    <g:color>blue</g:color>
  </item>
  <item>
    <title>test2</title>
    <g:id>2</g:id>
    <g:color>red</g:color>
  </item>
</channel></rss>

I've been searching for several days now and I can't seem to find the answer. I also worked through the Nokogiri documentation but this also didn't clear up anything.

What I am trying to do:

doc = Nokogiri::XML(*Google Shopping Feed*)
doc.css('channel > item').each do |item|
  puts item.css('g:id')
end

But this returns nothing. I've tried a lot of suggestions but none seem to work. Clearly I am missing out on something here but I can't figure out what.

Another thing that I can't figure out is retrieving a list of all attributes in an item. So my question is how can I retrieve the following array out of the Google Shopping feed:

# attributes => ['title', 'g:id', 'g:color']

Solution

  • Try using at_xpath along with text:

    doc.css('channel > item').each do |item|
      puts item.at_xpath('g:id').text
    end
    #=> 1
    #=> 2
    

    Another thing that I can't figure out is retrieving a list of all attributes in an item.

    You could get an array of each item like this:

    doc.css('channel > item').map do |item|
      item.element_children.map do |key|
        prefix = "#{key.namespace.prefix}:" if key.namespace
        name   = key.name
    
        "#{prefix}#{name}"
      end
    end
    #=> [["title", "g:id", "g:color"], ["title", "g:id", "g:color"]]
    

    If all items will have the exact same attributes, then you could just use the first element (instead of iterating all of them):

    doc.css('channel > item').first.element_children.map do |key|
      prefix = "#{key.namespace.prefix}:" if key.namespace
      name   = key.name
    
      "#{prefix}#{name}"
    end
    #=> ["title", "g:id", "g:color"]