perlxml-libxml

perl XML::LibXML get direct child text node content


Like in this snippet:

<p>content 1 of p <span>content of span</span> content 2 of p </p>

I would like to only obtain the following: content 1 of p and content 2 of p, not content of span.

Is there a way to do it?


Solution

  • Using an XPath:

    for my $text_node ($node->findnodes('text()')) {
       say $text_node;
    }
    

    Without using an XPath:

    for my $child_node ($node->childNodes()) {
       next if $child_node->nodeType != XML_TEXT_NODE;
    
       say $child_node;
    }
    

    Both output the following:

    content 1 of p
     content 2 of p
    

    The rest of the program:

    use strict;
    use warnings;
    use feature qw( say );
    
    use XML::LibXML qw( XML_TEXT_NODE );
    
    my $xml = '<p>content 1 of p <span>content of span</span> content 2 of p </p>';
    
    my $doc = XML::LibXML->new->parse_string($xml);
    my $node = $doc->documentElement();