perlxml-twig

Extract subset of XML with XML::Twig


I'm trying to use XML::Twig to extract a subset of an XML document so that I can convert it to CSV.

Here's a sample of my data

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Actions>
  <Click>
    <Field1>Data1</Field1>
    <Field2>Data2</Field2>
  </Click>
  <Click>
    <Field1>Data3</Field1>
    <Field2>Data4</Field2>
  </Click>
</Actions>

And here's an attempt at coding the desired outcome

#!/usr/bin/env perl

use strict;
use warnings;

use XML::Twig;
use Text::CSV; # later
use Data::Dumper;

my $file = shift @ARGV or die "Need a file to process: $!";

my $twig = XML::Twig->new();
$twig->parsefile($file);
my $root = $twig->root;

my @data;

for my $node ( $twig->findnodes( '//Click/*' ) ) {
  my $key = $node->name;
  my $val = $node->text;
  push @data, { $key => $val }
}

print Dumper \@data;

which gives

$VAR1 = [
          {
            'Field1' => 'Data1'
          },
          {
            'Field2' => 'Data2'
          },
          {
            'Field1' => 'Data3'
          },
          {
            'Field2' => 'Data4'
          }
        ];

What I'm looking to create is an array of hashes, if that's best

my @AoH = (
    { Field1 => 'Data1', Field2 => 'Data2' },
    { Field1 => 'Data3', Field2 => 'Data4' },
)

I'm not sure how to loop through the data to extract this.


Solution

  • You structure has two levels, so you need two levels of loops.

    my @data;
    for my $click_node ( $twig->findnodes( '/Actions/Click' ) ) {
       my %click_data;
       for my $child_node ( $click_node->findnodes( '*' ) ) {
          my $key = $child_node->name;
          my $val = $child_node->text;
          $click_data{$key} = $val;
       }
    
       push @data, \%click_data;
    }
    
    local $Data::Dumper::Sortkeys = 1;
    print(Dumper(\@data));
    

    Output:

    $VAR1 = [
              {
                'Field1' => 'Data1',
                'Field2' => 'Data2'
              },
              {
                'Field1' => 'Data3',
                'Field2' => 'Data4'
              }
            ];