perlxml-parsingxml-twig

Perl: How to consider next XML tag as child tag of previous one?


In following data file, I want to consider each <Field> tag as child tag of <Register> and each <Register> as child of <Partition>. so, basically, I am trying to extract each <Partition> details with corresponding <Register> and <Field>. Since all these tags are separate and not as child-parent relationship, how can I get my desired output?

Since the file is very large, I do not want to make it as child-parent relationship as it will require find/replace and manual intervention.

<Partition>
    <Name>1</Name>
    <Abstract>2</Abstract>
    <Description>3</Description>
    <ParentName>4</ParentName>

    </Partition>
    <Partition>
    <Name>8</Name>
    <Abstract></Abstract>
    <Description>9</Description>
    <ParentName>10</ParentName>

    </Partition>
    <Register>
    <Name>12</Name>
    <Abstract></Abstract>
    <Description>13</Description>
    <ParentName>14</ParentName>

    <Size>32</Size>
    <AccessMode>15</AccessMode>
    <Type>16</Type>


    </Register>
    <Field>
    <Name>17</Name>
    <Abstract></Abstract>
    <Description></Description>
    <ParentName></ParentName>


    </Field>
    <Field>
    .
    .
    .
    </Field>
    <Register>
    .
    .
    .

    </Register>
    <Field>
    .
    .
    .

    </Field>
    <Field>
    .
    .
    .
    </Field>
    <Partition>
        <Name>88</Name>
        <Abstract></Abstract>
        <Description></Description>
        <ParentName>55</ParentName>

    </Partition>
    <Register>
        .
        .
        .

    </Register>
    <Field>
        .
        .
        .

    </Field>
    <Partition>
        .
        .
        .
    </Partition>
    <Partition>
        .
        .
        .
    </Partition>
    <Partition>
       .
       .
       .
    </Partition>
    <Register>
        .
        .
        .
    </Register>

I am using XML::Twig package and here is my code snippet:

foreach my $register ( $twig->get_xpath('//Register') ) # get each <Register>
    {
        #print $register, "\n";
        my $reg_name = $register->first_child('Name')->text;
        my $reg_abstract= $register->first_child('Abstract')->text;
        my $reg_description= $register->first_child('Description')->text;
       .
       .
       .
          foreach my $xml_field ($register->get_xpath('Field'))
          {
            my $reg_field_name= $xml_field->first_child('Name')->text;
            my $reg_field_abstract= $xml_field->first_child('Abstract')->text;
            #print "$reg_field_name \n";
            .
            .
            .

          }
  }

Solution

  • As per your comment, if you want to rewrite the file with Register and Field elements as children of Partition elements, here is what you could do:

    simplest solution, the whole file is loaded in memory:

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    
    use XML::Twig;
    
    my $test_file= 'test.xml';
    
    XML::Twig->new( twig_handlers => { 'Register|Field' => \&child,
                                     },
                    pretty_print => 'indented',
                  )
              ->parsefile( $test_file)
              ->print;
    
    sub child
      { my( $t, $child)= @_;
        $child->move( last_child => $child->prev_sibling( 'Partition'));
      }
    

    Since you mentioned that the file can be very large, below is a slightly more complex version that only keeps in memory 2 Partition elements (including the new children of the first one). When a Partition is parsed it uses flush_up_to to flush the tree, up to the previous Partition:

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    
    use XML::Twig;
    
    my $test_file= 'test.xml';
    
    XML::Twig->new( twig_handlers => { 'Partition' => \&parent,
                                       'Register|Field' => \&child,
                                     },
                    pretty_print => 'indented',
                  )
              ->parsefile( $test_file);
    
    sub child
      { my( $t, $child)= @_;
        $child->move( last_child => $child->prev_sibling( 'Partition'));
      }
    
    sub parent
      { my( $t, $partition)= @_;
        if( my $prev_partition = $partition->prev_sibling( 'Partition'))
          { $t->flush_up_to( $prev_partition); }
      }
    

    Note that since flush_up_to is used, at the end of the parsing the rest of the tree is automatically flushed

    If you need to write the XML to a specific file, instead of STDOUT, you can also pass a filehandle to flush_up_to.