xmlperlxml-libxml

XML::LibXML : To extract child and granchildren of a xml tag in perl


Xml Data:

<libraries>
 <group name="stdcell_globalsubtypes">
   <cell type="a" optional="1">
    <cell type="b" optional="1">
      <cell type="c" optional="1" >
        <cell type="d" optional="1" >
         <cell type="e" optional="1"/>
       </cell>
     </cell>
   </cell>
 </cell>
</group>

How can I access all the children and grandchildren nodes of group name = "stdcell_globalsubtypes" without having to parse through each child node using getChildrenByTagName("cell").

I need to parse this xml data and make a hash out of it as %hash = ('1'=>a,'2'=>b,'3'=>c,'4'=>d,'5'=>e)

Is there any API to get all the child nodes and sub child nodes? If not, How can I do it recursively?

Thanks in advance :)


Solution

  • I'm not XML expert... There's probably a more efficient way to solve this, but one way to do it is with a recursive function

    use strict;
    use warnings 'FATAL', 'all';
    use XML::LibXML;
    
    sub extract_cell_types {
        my $node = shift;
        my @return_array;
        my @cells = $node->getChildrenByTagName("cell");
        for my $cell (@cells) {
            my $type = $cell->getAttribute("type");
            push @return_array, $type;
            if ($cell->hasChildNodes) {
                push @return_array, extract_cell_types($cell);
            }
        }
        return @return_array;
    }
    
    my $doc = XML::LibXML->load_xml(string => <<'END');
    <doc>
    <group name="stdcell_globalsubtypes">
     <cell type="a" optional="1">
      <cell type="b" optional="1">
       <cell type="c" optional="1" >
        <cell type="d" optional="1" >
         <cell type="e" optional="1"/>
        </cell>
       </cell>
      </cell>
     </cell>
    </group>
    </doc>
    END
    
    my $doce = $doc->getDocumentElement;
    
    my @types;
    my @groups = $doce->getChildrenByTagName("group");
    for my $gn (@groups) {
        if ($gn->getAttribute("name") eq "stdcell_globalsubtypes") {
            push @types, extract_cell_types($gn);
        }
    }
    
    print join(', ', @types) . "\n";