xmlperlxml-twig

How do I select a specific sub-node of an XML file using XML::Twig?


I have an XML file:

<?xml version="1.0" encoding="utf-16"?>
<!DOCTYPE tmx SYSTEM "56.dtd">
<body>
<tu changedate="20130625T175037Z"">
  <tuv xml:lang="pt-pt">
    <prop type="x-context-pre">&lt;seg&gt;Some text.&lt;/seg&gt;</prop>
    <prop type="x-context-post">&lt;seg&gt;Other text.&lt;/seg&gt;</prop>
    <seg>The text I'm interested.</seg>
  </tuv>
  <tuv xml:lang="it">
    <seg>And it's translation in italian.</seg>
  </tuv>
 </tu> 

 .... followed by other <tu>'s
</body>

Since it's a huge file, I'm using XML::Twig to parse it and get the parts I'm interested in. I'm particularly interested in seg's node content as well as tu's node attribute.

Here's the code I've got so far:

use 5.010;
use strict;
use warnings;
use XML::Twig;

my $filename = 'filename.tmx';
my $out_filename = 'out.xml';
open my $out, '>', $out_filename;
binmode $out;

my $original_twig = new XML::Twig (pretty_print => 'nsgmls', twig_handlers => {tu =>   \&original_tu});
$original_twig->parsefile($filename);

sub original_tu {
    my($twig, $original_tu) = @_;
    my $original_seg = $original_tu-> first_child('./tuv/seg')->text;
   
}

Perl (or should I say XML::Twig) tells me that I've got:

wrong navigation condition './tuv/seg' ()

Does anyone know how to access the seg node's text and how to access the changedate attribute of tu's node?


Solution

  • Here is one way to access that node and attribute:

    my $original_seg = $original_tu->first_child('tuv')->first_child('seg')->text;
    my $date = $original_tu->att('changedate');