xmlperlxml-twig

removing all but one node in perl XML::Twig


I've got an xml file with a number of level3 elements. I want to remove all but one such elements. My xml file:

<?xml version="1.0" encoding="UTF-8"?>
<level1 id="level1_id">
    <level2 id="level2_id">
        <level3 id="level3_id1">
            <attributes>
                <attribute>1</attribute>
                <attribute>2</attribute>
            </attributes>
        </level3>
        <level3 id="level3_id2">
            <attributes>
                <attribute>1</attribute>
                <attribute>2</attribute>
            </attributes>
        </level3>
        <level3 id="level3_id3">
            <attributes>
                <attribute>1</attribute>
                <attribute>2</attribute>
            </attributes>
        </level3>
    </level2>
</level1>

My perl script:

my $filename = 'test3.xml';
my $outfile = $filename."_after";
open my $output, '>', $outfile or die "Couldn't open output file\n";
my $twig = new XML::Twig (twig_handlers => { 'level2' => \&edit });
$twig->parsefile($filename);
#$twig->flush;
$twig->print($output);

sub edit {
    my ($twig, $element) = @_;
    my @elements= $element->children('level3');
    print $#elements."\n";
    my @elements= @elements[1..$#elements];
    print $#elements."\n";
    my $count = 0;
    foreach (@elements){
        $count++;
        $_->delete;
    }
    print $count;
    $twig->purge;

}

This however just leaves the level1 element:

<?xml version="1.0" encoding="UTF-8"?>
<level1 id="level1_id"></level1>

On the other hand, my script works just fine when the top level is level2. Example xml file and the result after processing:

<?xml version="1.0" encoding="UTF-8"?>

<level2 id="level2_id">
    <level3 id="level3_id1">
        <attributes>
            <attribute>1</attribute>
            <attribute>2</attribute>
        </attributes>
    </level3>
    <level3 id="level3_id2">
        <attributes>
            <attribute>1</attribute>
            <attribute>2</attribute>
        </attributes>
    </level3>
    <level3 id="level3_id3">
        <attributes>
            <attribute>1</attribute>
            <attribute>2</attribute>
        </attributes>
    </level3>
</level2>

Result:

<?xml version="1.0" encoding="UTF-8"?>
<level2 id="level2_id">
    <level3 id="level3_id1">
        <attributes>
            <attribute>1</attribute>
            <attribute>2</attribute>
        </attributes>
    </level3>
</level2>

This is exactly what I want, i.e. just one level3 element left. What am I doing wrong? Is it to do with how I define twig handlers? I don't want to hard code the xml structure, e.g. my $twig = new XML::Twig (twig_handlers => { 'level1/level2' => \&edit }); I don't know how deep level2 will be in an actual xml file and the actual files might not be identical in structure, so this part should be dynamic


Solution

  • There is no need for the line $twig->purge or anything like it and I don't understand why you have written it

    It will discard anything that has been parsed but not printed to the output, which is the whole of the level2 element that you have just edited

    I also recommend that you write

    my $twig = XML::Twig->new(
        twig_handlers => { level2 => \&edit },
        pretty_print  => 'indented',
    );
    

    as the indirect object syntax that you have used is ambiguous and prone to errors. The pretty_print option will also make the output XML more readable.