ruby-on-railsxmlxml-parsingxml-nil

How to convert XML to hash in Rails where empty array and nil values are preserved


When I convert an XML structure to hash with Hash.from_xml(@xml) in Rails, the parser does not distinguish between empty arrays and nil values, whereas the XML depicts nodes that are immediately terminated with \ to be empty arrays, e.g. <audio_languages/> vs. those with attribute nil="true" to be interpreted as nil values.

The XML structure (which I have control over on how to generate) looks like this:

<response>
  <medias>
    <media>
      <id>1</id>
      <name>Media-1</name>
      <audio_languages/>
      <avg_rating nil="true"></avg_rating>
    </media>
    <media>
      <id>2</id>
      <name>Media-2</name>
      <audio_languages/>
      <avg_rating nil="true"></avg_rating>
    </media>
  </medias>
</response>

The expected output from Hash.from_xml(@xml) would be:

{"response"=>{"medias"=>{"media"=>[{"id"=>"1", "name"=>"Media-1", "audio_languages"=>[], "avg_rating"=>nil}, {"id"=>"2", "name"=>"Media-2", "audio_languages"=>[], "avg_rating"=>nil}]}}} 

instead, I get nil values for audio_languages and avg_rating:

{"response"=>{"medias"=>{"media"=>[{"id"=>"1", "name"=>"Media-1", "audio_languages"=>nil, "avg_rating"=>nil}, {"id"=>"2", "name"=>"Media-2", "audio_languages"=>nil, "avg_rating"=>nil}]}}}  

Solution

  • I ended up parsing the nodes using libxml and I am checking if the nodes has the signature I am looking for in order to figure out if I want to convert as an empty array vs. a nil value.

    # Usage: Hash.from_xml_with_libxml(xml)
    require 'xml/libxml'
    # adapted from 
    # http://movesonrails.com/articles/2008/02/25/libxml-for-active-resource-2-0
    
    class Hash 
      class << self
        def from_xml_with_libxml(xml, strict=true) 
          LibXML::XML.default_load_external_dtd = false
          LibXML::XML.default_pedantic_parser   = strict
          result = LibXML::XML::Parser.string(xml).parse 
          return { result.root.name.to_s => xml_node_to_hash_with_libxml(result.root)}
        end 
    
        def xml_node_to_hash_with_libxml(node) 
          # If we are at the root of the document, start the hash 
          if node.element? 
            if node.children? 
              result_hash = {} 
    
              node.each_child do |child| 
                result = xml_node_to_hash_with_libxml(child) 
    
                if child.name == "text"
                  if !child.next? and !child.prev?
                    return result
                  end
                elsif result_hash[child.name]
                  if result_hash[child.name].is_a?(Object::Array)
                    result_hash[child.name] << result
                  else
                    result_hash[child.name] = [result_hash[child.name]] << result
                  end
                else 
                  result_hash[child.name] = result
                end
              end
              return result_hash 
            else 
              # Nodes of sort <audio_languages/>, are arrays, 
              # and nodes like <average_rating "nil"="true"/> are nil values.
              if node.to_s.match(/^\<(.+)\/\>$/) && nil == node.attributes["nil"]
                return []
              end
              return nil 
            end 
          else 
            return node.content.to_s 
          end 
        end          
      end
    end