perlxml-libxml

Add new class to an element using XML::LibXML


I have a simple HTML snippet as follows:

<div class="out"><p class="a">This is a paragraph. <p class="b">another</p></p></div>

I need to add a new class b to the outer p element, resulting in the following expected result:

<div class="out"><p class="a b">This is a paragraph. <p class="b">another</p></p></div>

The inner p element should remain the same.

Is this possible?


Solution

  • Find the right element and change its attribute using setAttribute method.

    The code below uses requirements clarified in a comment: the paragraph of interest is the first one below a div, and its attribute class need be appended if it exists, or created.

    use warnings;
    use strict;
    use feature 'say';
    
    use XML::LibXML;
    
    my $xml = q(<div class="out"><p class="a">This is a paragraph. )
            . q(<p class="b">another</p></p></div>);  #/
    
    my $doc = XML::LibXML->load_xml(string => $xml);
    
    my @nodes = $doc->findnodes('//p');
    
    for my $node (@nodes) { 
        if ($node->parentNode->nodeName eq 'div') {
            my $attr_val = $node->getAttribute('class');
            $node->setAttribute(
                class => ($attr_val ? "$attr_val " : '') . 'b'
            );
        }   
    }
    
    say $doc;
    

    This still assumes that the p doesn't have sibling paragraphs (so directly under the same div), or if it does that they should be changed in the same way as well. If that's not the case then one should identify all those siblings and skip them.

    The search is first for p and then the node is checked for its parent, in case some subtle refinements creep in ("Yes, a p under a div, but except for the edge case of ..."). But if that's not a concern and it is strictly the p under a div then better go for that directly

    my @nodes = $doc->findnodes('//div/p');
    

    Then append to the class attribute or create it the same way as above.

    The above prints

    <?xml version="1.0"?>
    <div class="out"><p class="a b">This is a paragraph. <p class="b">another</p></p></div>