xmlperlxml-twig

Navigating XML file using XML::Twig


I must say that I am a newbie at Perl and XML:Twig but I am a quick learner. Any help you can provide would be greatly appreciated.

Basically, I am having trouble navigating to certain nodes in an XML file so that I can extract information.

I am using a TwigHandler to get me access to a certain node in the XML, specifically the "Selection" node. The TwigHandler is working fine for me in the sense that I am able to extract some of the information I need at this level. However, there are additional nodes under "Selection" that I need to inspect and I do not know how to get to them.

I have copied a snippet of my XML below so that you can see what it looks like. In it you can see the Selection node. I am able to access the attribute "id" and the field "Name" with no problems using my Twig Handler, but I need to loop through all of those "Message" nodes under the Selection node in order to extract all of the attributes from each one of them. I have tried to get "get_xpath" to work but to no avail.

Please keep in mind that there are Message nodes under every Selection node in my XML. You only see 2 Selection nodes in the example below but in reality I could have hundreds of the "Selection" nodes with each one having "Message" nodes as children. I need to extract information from the "Message" nodes under the current "Selection" node that I am working with, i.e., I don't care about the other "Message" nodes that may be under different "Selection" Nodes.

<Selection id="54008473">
  <Name>Master</Name>
  <Contents>
    <Message refid="125796458" suppress="true" status="Unchanged"/>
    <Message refid="123991123" suppress="true" status="Unchanged"/>
    <Message refid="128054778" custom="true" status="New">
      <Content language="en"><![CDATA[<p>ada</p>]]></Content>
    </Message>
  </Contents>
  <Messages/>
  <MessagePriority>
    <Zone name="Insured Letter Intro">
      <MessageInstance id="125796375" name="LD Letter Introduction" status="Active" delivery="Mandatory" priority="1" suppressed="false" selected="true"/>
    </Zone>
    <Zone name="Insured Letter Logo">
      <MessageInstance id="125794623" name="Insured Letter Logo" status="Active" delivery="Mandatory" priority="1" suppressed="false" selected="true"/>
     </Zone>
  </MessagePriority>
</Selection>
<Selection id="54008475" datavaluerefid="54008479">
  <Name>RMBC</Name>
  <Contents>
    <Message refid="125796458" sameasparent="true" parentrefid="54008473" status="Unchanged"/>
    <Message refid="123991123" sameasparent="true" parentrefid="54008473" status="Unchanged"/>
    <Message refid="128054778" custom="true" status="New">
      <Content language="en"><![CDATA[<p>ada</p>]]></Content>
    </Message>
  </Contents>
  <Messages/>
  <MessagePriority>
     ...
  </MessagePriority>
</Selection>

Solution

  • Use findnodes() with a relative XPath in the handler for Selection to find the Contents/Message child nodes:

    #!/usr/bin/perl
    use warnings;
    use strict;
    
    use XML::Twig;
    
    my %selections;
    
    my $twig = XML::Twig->new(
        twig_handlers => {
            Selection => sub {
                #$_->print();
                print "selection id: ", $_->att('id'), "\n";
    
                my @messages;
                foreach my $message ($_->findnodes('./Contents/Message')) {
                    #$message->print();
                    print "message refid: ", $message->att('refid'), "\n";
    
                    # store "refid" attribute in messages list
                    push(@messages, $message->att('refid'));
                }
    
                # store collected Message nodes under selection ID
                $selections{ $_->att('id') } = \@messages;
            },
        }
    );
    
    $twig->parse(\*DATA);
    
    while (my($id, $messages) = each %selections) {
        print "Selection '${id}' messages: @{ $messages }\n";
    }
    
    exit 0;
    
    __DATA__
    <?xml version="1.0" encoding="UTF-8"?>
    <Root>
      <Selection id="54008473">
        <Name>Master</Name>
        <Contents>
          <Message refid="125796458" suppress="true" status="Unchanged"/>
          <Message refid="123991123" suppress="true" status="Unchanged"/>
          <Message refid="128054778" custom="true" status="New">
            <Content language="en"><![CDATA[<p>ada</p>]]></Content>
          </Message>
        </Contents>
        <Messages/>
        <MessagePriority>
          <Zone name="Insured Letter Intro">
            <MessageInstance id="125796375" name="LD Letter Introduction" status="Active" delivery="Mandatory" priority="1" suppressed="false" selected="true"/>
          </Zone>
          <Zone name="Insured Letter Logo">
            <MessageInstance id="125794623" name="Insured Letter Logo" status="Active" delivery="Mandatory" priority="1" suppressed="false" selected="true"/>
          </Zone>
        </MessagePriority>
      </Selection>
      <Selection id="54008475" datavaluerefid="54008479">
        <Name>RMBC</Name>
        <Contents>
          <Message refid="125796458" sameasparent="true" parentrefid="54008473" status="Unchanged"/>
          <Message refid="123991123" sameasparent="true" parentrefid="54008473" status="Unchanged"/>
          <Message refid="128054778" custom="true" status="New">
            <Content language="en"><![CDATA[<p>ada</p>]]></Content>
          </Message>
        </Contents>
        <Messages/>
        <MessagePriority>
          ...
        </MessagePriority>
      </Selection>
    </Root>
    

    Test run:

    $ perl dummy.pl
    selection id: 54008473
    message refid: 125796458
    message refid: 123991123
    message refid: 128054778
    selection id: 54008475
    message refid: 125796458
    message refid: 123991123
    message refid: 128054778
    Selection '54008473' messages: 125796458 123991123 128054778
    Selection '54008475' messages: 125796458 123991123 128054778