I am using the perl XML::LibXML module to manipulate an XML file.
I want to remove the opening and closing tags of an XML node if it has a certain attribute, making its text and subnodes as a part of the parent of the node.
Here's an unsuccessful attempt. If fails with a insertBefore/insertAfter: HIERARCHY_REQUEST_ERR
:
#!/usr/bin/env perl
use 5.020;
use warnings;
use XML::LibXML;
#the input xml
my $inputstr = <<XML;
<root>
<a>
<b class="deletethistag">keep this text<c>keep this c node</c>keep this text too</b>
<b class="someothertag">don't change this</b>
<b>don't change this node without an attribute</b>
<c class="type1">don't change this either</c>
</a>
</root>
XML
my $desiredstr = <<XML ;
<root>
<a>keep this text<c>keep this c node</c>keep this text too
<b class="someothertag">don't change this</b>
<b>don't change this node without an attribute</b>
<c class="type1">don't change this either</c>
</a>
</root>
XML
my $dom = XML::LibXML->load_xml(
string => $inputstr
);
# Convert $inputstr to $desiredstr *** doesn't work ***
foreach my $node ($dom->findnodes(q#//a/b[@class="deletethistag"]/*#)) {
my $nodestring = $node->toString(1);
say STDERR $nodestring;
my $replacementnode = XML::LibXML->load_xml(string => $nodestring);
$node->parentNode()->insertAfter($replacementnode, $node);
$node->unbindNode();
}
say $dom->toString(1);
I want to use the code to remove <span lang="en" xml:space="preserve">...</span>
markup from a file, but I have framed it as a more general question so that I understand more of the details of working with XML::LibXML.
$node->childNodes()
returns all the text nodes and other sub-nodes of $node.
Insert all the children of $node into $node's parent at the same place as $node. Then delete the original $node with $node->unbindNode()
Here's a working script:
#!/usr/bin/env perl
use 5.020;
use warnings;
use XML::LibXML;
#the input xml
my $inputstr = <<XML;
<root>
<a>
<b class="deletethistag">keep this text<c>keep this c node</c>keep this text too</b>
<b class="someothertag">don't change this</b>
<b>don't change this node without an attribute</b>
<c class="type1">don't change this either</c>
</a>
</root>
XML
my $desiredstr = <<XML ;
<root>
<a>
keep this text<c>keep this c node</c>keep this text too
<b class="someothertag">don't change this</b>
<b>don't change this node without an attribute</b>
<c class="type1">don't change this either</c>
</a>
</root>
XML
my $dom = XML::LibXML->load_xml(
string => $inputstr
);
for my $node ($dom->findnodes(q#//a/b[@class="deletethistag"]#)) {
my $parent = $node->parentNode();
for my $child_node ( $node->childNodes() ) {
$parent->insertBefore($child_node, $node);
}
$node->unbindNode();
}
say $dom->toString();