xmlperlxml-comments

How to ignore XML comments when parsing XML?


I want collect all tags in from XML file. How can I remove comments syntax only?

XML File:

<xml>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Holt</surname>
<given-names> Maurice<!--<xref ref-type="fn" rid="fnI_1"><sup>1</sup></xref>--></given-names>
</name>
</contrib>
</contrib-group>
</xml>

I need output as:

<xml>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Holt</surname>
<given-names> Maurice<xref ref-type="fn" rid="fnI_1"><sup>1</sup></xref></given-names>
</name>
</contrib>
</contrib-group>
</xml>

How can I remove comments.. without remove contains?

script:

#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;

open(my $output , '>', "split.xml") || die "can't open the Output $!\n";
my $xml = XML::Twig->new( twig_handlers => { xref => sub{comments => 'drop'} } );
$xml->parsefile("sample.xml");
$xml->print($output);

I can't do it... How can I remove <!-- --> only without remove contain?


Solution

  • #!/usr/bin/perl
    use warnings;
    use strict;
    
    use XML::Twig;
    
    open my $output , '>', 'split.xml' or die "Can't open: $!\n";
    my $xml = XML::Twig->new( comments      => 'process',       # Turn on comment processing
                              twig_handlers =>
                                  { '#COMMENT' => \&uncomment }
                            );
    $xml->parsefile('sample.xml');
    $xml->print($output);
    
    sub uncomment {
        my ($xml, $comment) = @_;
        $comment->set_outer_xml($comment->text);                # Replace the comment with its contents.
    }