I occasionally need to extract data from a large XML database export. The file size is between 600 and 700 MBs. With a few days of research I concluded that XML::Twig
is the way to go, because its handlers allow me to process the file chunk-by-chunk.
I followed the examples, and with some hours of trial and error I wrote the Perl code that performed the job I needed to do. The code worked, it extracted the data I wanted and calculated the details I needed. But while running the script (took about 15 minutes to process the 620MBs XML) I noticed in Activity Monitor that towards the end the memory usage quickly increases to an unreasonably high level.
I removed the Perl code that processes the XML tag I was interested in and replaced it with a single instruction to increment a variable, simply counting how many of product
elements had been found. The memory usage was similar. First normal, then towards the final few elements it started to rapidly increase and eat up everything I have.
I added some code to monitor time, the number of items processed, and how much memory my Perl process is using. This code is shown here:
#!/usr/bin/perl
use strict;
use utf8;
use XML::Twig;
binmode STDOUT, ":encoding(UTF-8)";
binmode STDERR, ":encoding(UTF-8)";
my $my_processID = $$;
my ($xml_file_name, $second_arg) = @ARGV;
unless (defined($xml_file_name)) { die "Please provide XML filename as 1st argument."; }
my $product_counter = 0;
my @shell_command_output_text;
my $resource_usage_status;
my $reference_timestamp;
my $datetimeNow;
my $t = XML::Twig->new(
twig_handlers => { 'product' => \&do_what_needs_to_be_done_with_each_product },
pretty_print => 'indented'
);
$reference_timestamp = time();
$t->parsefile( $xml_file_name, ErrorContext => 2 );
print "(Parse finished)\n";
showstatus();
$t->purge;
print "(Purge finished)\n";
showstatus();
print "\nJOB COMPLETE.\n$product_counter items processed.\n";
exit 0;
sub do_what_needs_to_be_done_with_each_product() {
my( $twig, $product)= @_;
$product_counter++;
## Display status every 5 seconds:
if ( ( time()-$reference_timestamp ) > 5 ) {
showstatus();
$reference_timestamp = time();
}
$twig->purge;
} ## end-sub
sub showstatus() {
@shell_command_output_text = `top -l 1 -pid $my_processID -stats pid,command,cpu,mem`;
## I only need the last line of that output.
$resource_usage_status = $shell_command_output_text[$#shell_command_output_text];
$datetimeNow = localtime();
chomp $resource_usage_status;
print "Res: $resource_usage_status -> item count: $product_counter <- time: $datetimeNow\n";
} ## end-sub
And here is what was shown on the screen when I run the above script.
keve@deimos:DATA$ ./twigExample.pl product_db_extract.xml
Res: 3237 perl5.18 0.0 12M+ -> item count: 3086 <- time: Sat Nov 18 23:45:48 2017
Res: 3237 perl5.18 0.0 12M+ -> item count: 6521 <- time: Sat Nov 18 23:45:55 2017
Res: 3237 perl5.18 0.0 12M+ -> item count: 9909 <- time: Sat Nov 18 23:46:02 2017
Res: 3237 perl5.18 0.0 12M+ -> item count: 12751 <- time: Sat Nov 18 23:46:09 2017
Res: 3237 perl5.18 0.0 12M+ -> item count: 15570 <- time: Sat Nov 18 23:46:16 2017
Res: 3237 perl5.18 0.0 12M+ -> item count: 18151 <- time: Sat Nov 18 23:46:23 2017
Res: 3237 perl5.18 0.0 12M+ -> item count: 20828 <- time: Sat Nov 18 23:46:30 2017
Res: 3237 perl5.18 0.0 12M+ -> item count: 23561 <- time: Sat Nov 18 23:46:37 2017
Res: 3237 perl5.18 0.0 12M+ -> item count: 26230 <- time: Sat Nov 18 23:46:44 2017
Res: 3237 perl5.18 0.0 12M+ -> item count: 28861 <- time: Sat Nov 18 23:46:51 2017
Res: 3237 perl5.18 0.0 12M+ -> item count: 31665 <- time: Sat Nov 18 23:46:58 2017
Res: 3237 perl5.18 0.0 13M+ -> item count: 34443 <- time: Sat Nov 18 23:47:05 2017
Res: 3237 perl5.18 0.0 13M+ -> item count: 36952 <- time: Sat Nov 18 23:47:12 2017
Res: 3237 perl5.18 0.0 13M+ -> item count: 39461 <- time: Sat Nov 18 23:47:19 2017
Res: 3237 perl5.18 0.0 13M+ -> item count: 42113 <- time: Sat Nov 18 23:47:26 2017
Res: 3237 perl5.18 0.0 13M+ -> item count: 44794 <- time: Sat Nov 18 23:47:33 2017
Res: 3237 perl5.18 0.0 13M+ -> item count: 47510 <- time: Sat Nov 18 23:47:40 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 50154 <- time: Sat Nov 18 23:47:47 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 52869 <- time: Sat Nov 18 23:47:54 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 55497 <- time: Sat Nov 18 23:48:01 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 57725 <- time: Sat Nov 18 23:48:08 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 59754 <- time: Sat Nov 18 23:48:15 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 61600 <- time: Sat Nov 18 23:48:22 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 63244 <- time: Sat Nov 18 23:48:29 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 64801 <- time: Sat Nov 18 23:48:36 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 66448 <- time: Sat Nov 18 23:48:43 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 68022 <- time: Sat Nov 18 23:48:50 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 69561 <- time: Sat Nov 18 23:48:57 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 71180 <- time: Sat Nov 18 23:49:04 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 73009 <- time: Sat Nov 18 23:49:11 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 75259 <- time: Sat Nov 18 23:49:18 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 77331 <- time: Sat Nov 18 23:49:25 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 79344 <- time: Sat Nov 18 23:49:32 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 81389 <- time: Sat Nov 18 23:49:39 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 83345 <- time: Sat Nov 18 23:49:46 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 85346 <- time: Sat Nov 18 23:49:53 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 87350 <- time: Sat Nov 18 23:50:00 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 89359 <- time: Sat Nov 18 23:50:07 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 91374 <- time: Sat Nov 18 23:50:14 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 93451 <- time: Sat Nov 18 23:50:21 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 95542 <- time: Sat Nov 18 23:50:28 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 97635 <- time: Sat Nov 18 23:50:35 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 99764 <- time: Sat Nov 18 23:50:42 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 101914 <- time: Sat Nov 18 23:50:49 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 103896 <- time: Sat Nov 18 23:50:56 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 105982 <- time: Sat Nov 18 23:51:03 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 108023 <- time: Sat Nov 18 23:51:10 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 110090 <- time: Sat Nov 18 23:51:17 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 112033 <- time: Sat Nov 18 23:51:24 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 114039 <- time: Sat Nov 18 23:51:31 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 116038 <- time: Sat Nov 18 23:51:38 2017
Res: 3237 perl5.18 0.0 15M+ -> item count: 118031 <- time: Sat Nov 18 23:51:45 2017
(Parse finished)
Res: 3237 perl5.18 0.0 3780M+ -> item count: 119335 <- time: Sat Nov 18 23:54:21 2017
(Purge finished)
Res: 3237 perl5.18 0.0 2634M+ -> item count: 119335 <- time: Sat Nov 18 23:54:27 2017
JOB COMPLETE.
119335 items processed.
keve@deimos:DATA$
Notice the 3 minutes gap where "Parse finished" is printed.
I expected to see the memory usage go up for the final few product tags, just like I have seen it in Activity Monitor. But the memory usage printed—up to the very last product tag—was a perfectly acceptable 15 MB. The tricky part is not shown in this output: only the missing 3 minutes time and the 3780M indicates that something odd happens right after the final product tag.
So I wrote a shell script to keep track of memory usage during those missing 3 minutes at the end too. Here is what I was running in a separate window while the perl script was processing the XML in its own window.
#!/bin/bash
while true
do
perlStat=`top -l 1 -pid $1 -stats pid,command,cpu,mem,purg,vsize | grep perl5`
echo "`date` ::> $perlStat"
sleep 5
done
And here is the output of that shell script running while the perl script was processing the XML:
keve@deimos:DATA$ ./monitorTWIGprocess.sh 3237
2017 Nov 18 Szo 23:45:53 CET ::> 3237 perl5.18 0.0 12M+ 0B N/A
2017 Nov 18 Szo 23:46:00 CET ::> 3237 perl5.18 0.0 12M+ 0B N/A
2017 Nov 18 Szo 23:46:06 CET ::> 3237 perl5.18 0.0 12M+ 0B N/A
2017 Nov 18 Szo 23:46:12 CET ::> 3237 perl5.18 0.0 12M+ 0B N/A
2017 Nov 18 Szo 23:46:18 CET ::> 3237 perl5.18 0.0 12M+ 0B N/A
2017 Nov 18 Szo 23:46:24 CET ::> 3237 perl5.18 0.0 12M+ 0B N/A
2017 Nov 18 Szo 23:46:30 CET ::> 3237 perl5.18 0.0 12M+ 0B N/A
2017 Nov 18 Szo 23:46:36 CET ::> 3237 perl5.18 0.0 12M+ 0B N/A
2017 Nov 18 Szo 23:46:42 CET ::> 3237 perl5.18 0.0 12M+ 0B N/A
2017 Nov 18 Szo 23:46:48 CET ::> 3237 perl5.18 0.0 12M+ 0B N/A
2017 Nov 18 Szo 23:46:54 CET ::> 3237 perl5.18 0.0 12M+ 0B N/A
2017 Nov 18 Szo 23:47:01 CET ::> 3237 perl5.18 0.0 12M+ 0B N/A
2017 Nov 18 Szo 23:47:07 CET ::> 3237 perl5.18 0.0 13M+ 0B N/A
2017 Nov 18 Szo 23:47:13 CET ::> 3237 perl5.18 0.0 13M+ 0B N/A
2017 Nov 18 Szo 23:47:19 CET ::> 3237 perl5.18 0.0 13M+ 0B N/A
2017 Nov 18 Szo 23:47:25 CET ::> 3237 perl5.18 0.0 13M+ 0B N/A
2017 Nov 18 Szo 23:47:31 CET ::> 3237 perl5.18 0.0 13M+ 0B N/A
2017 Nov 18 Szo 23:47:37 CET ::> 3237 perl5.18 0.0 13M+ 0B N/A
2017 Nov 18 Szo 23:47:43 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:47:49 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:47:56 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:48:02 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:48:08 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:48:14 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:48:20 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:48:26 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:48:32 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:48:38 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:48:44 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:48:51 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:48:57 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:49:03 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:49:09 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:49:15 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:49:21 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:49:27 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:49:33 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:49:39 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:49:46 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:49:52 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:49:58 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:50:04 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:50:10 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:50:16 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:50:22 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:50:28 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:50:35 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:50:41 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:50:47 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:50:53 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:50:59 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:51:05 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:51:11 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:51:17 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:51:23 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:51:29 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:51:36 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:51:42 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:51:48 CET ::> 3237 perl5.18 0.0 15M+ 0B N/A
2017 Nov 18 Szo 23:51:54 CET ::> 3237 perl5.18 0.0 125M+ 0B N/A
2017 Nov 18 Szo 23:52:00 CET ::> 3237 perl5.18 0.0 278M+ 0B N/A
2017 Nov 18 Szo 23:52:06 CET ::> 3237 perl5.18 0.0 430M+ 0B N/A
2017 Nov 18 Szo 23:52:12 CET ::> 3237 perl5.18 0.0 584M+ 0B N/A
2017 Nov 18 Szo 23:52:18 CET ::> 3237 perl5.18 0.0 737M+ 0B N/A
2017 Nov 18 Szo 23:52:24 CET ::> 3237 perl5.18 0.0 889M+ 0B N/A
2017 Nov 18 Szo 23:52:31 CET ::> 3237 perl5.18 0.0 1042M+ 0B N/A
2017 Nov 18 Szo 23:52:37 CET ::> 3237 perl5.18 0.0 1195M+ 0B N/A
2017 Nov 18 Szo 23:52:43 CET ::> 3237 perl5.18 0.0 1349M+ 0B N/A
2017 Nov 18 Szo 23:52:49 CET ::> 3237 perl5.18 0.0 1502M+ 0B N/A
2017 Nov 18 Szo 23:52:55 CET ::> 3237 perl5.18 0.0 1656M+ 0B N/A
2017 Nov 18 Szo 23:53:01 CET ::> 3237 perl5.18 0.0 1809M+ 0B N/A
2017 Nov 18 Szo 23:53:07 CET ::> 3237 perl5.18 0.0 1961M+ 0B N/A
2017 Nov 18 Szo 23:53:13 CET ::> 3237 perl5.18 0.0 2114M+ 0B N/A
2017 Nov 18 Szo 23:53:19 CET ::> 3237 perl5.18 0.0 2267M+ 0B N/A
2017 Nov 18 Szo 23:53:26 CET ::> 3237 perl5.18 0.0 2420M+ 0B N/A
2017 Nov 18 Szo 23:53:32 CET ::> 3237 perl5.18 0.0 2573M+ 0B N/A
2017 Nov 18 Szo 23:53:38 CET ::> 3237 perl5.18 0.0 2726M+ 0B N/A
2017 Nov 18 Szo 23:53:44 CET ::> 3237 perl5.18 0.0 2879M+ 0B N/A
2017 Nov 18 Szo 23:53:50 CET ::> 3237 perl5.18 0.0 3032M+ 0B N/A
2017 Nov 18 Szo 23:53:56 CET ::> 3237 perl5.18 0.0 3186M+ 0B N/A
2017 Nov 18 Szo 23:54:02 CET ::> 3237 perl5.18 0.0 3339M+ 0B N/A
2017 Nov 18 Szo 23:54:08 CET ::> 3237 perl5.18 0.0 3492M+ 0B N/A
2017 Nov 18 Szo 23:54:15 CET ::> 3237 perl5.18 0.0 3646M+ 0B N/A
2017 Nov 18 Szo 23:54:21 CET ::> 3237 perl5.18 0.0 3786M+ 0B N/A
2017 Nov 18 Szo 23:54:27 CET ::> 3237 perl5.18 0.0 3502M+ 0B N/A
2017 Nov 18 Szo 23:54:33 CET ::>
^C
keve@deimos:DATA$
Notice the increase of memory usage at 23:51:54. That is when the 3-minute gap in the other output starts. And memory usage is rapidly going up from there.
I am puzzled as to what causes this rapid increase of memory usage at the end of processing the last (119,335th) product tag. And whatever it is, why does it use so much memory? 3.5G instead of the 15M, that is not OK.
Am I doing something wrong?
Any suggestion on how to avoid the memory usage increase?
Added later, in response to the answers my original post received.
Supplement
Switching to twig_roots.
I modified the twig handler block by commenting out the handler definition and adding its twig_roots variety instead, as shown here:
my $t = XML::Twig->new(
##twig_handlers => { 'product' => \&do_what_needs_to_be_done_with_each_product },
twig_roots => { 'product' => \&do_what_needs_to_be_done_with_each_product },
twig_print_outside_roots => 0,
pretty_print => 'indented'
);
I left the $twig->purge();
instruction inside my sub.
Here is the output of the perl script:
keve@deimos:DATA$ ./twigExample.pl product_db_extract.xml
Res: 1591 perl5.18 0.0 12M+ -> item count: 2852 <- time: Sun Nov 19 15:20:51 2017
Res: 1591 perl5.18 0.0 12M+ -> item count: 6106 <- time: Sun Nov 19 15:20:58 2017
Res: 1591 perl5.18 0.0 12M+ -> item count: 9341 <- time: Sun Nov 19 15:21:05 2017
Res: 1591 perl5.18 0.0 12M+ -> item count: 12141 <- time: Sun Nov 19 15:21:12 2017
.
.
.
Res: 1591 perl5.18 0.0 15M+ -> item count: 112293 <- time: Sun Nov 19 15:26:41 2017
Res: 1591 perl5.18 0.0 15M+ -> item count: 114255 <- time: Sun Nov 19 15:26:48 2017
Res: 1591 perl5.18 0.0 15M+ -> item count: 116171 <- time: Sun Nov 19 15:26:55 2017
Res: 1591 perl5.18 0.0 15M+ -> item count: 118099 <- time: Sun Nov 19 15:27:02 2017
(Parse finished)
Res: 1591 perl5.18 0.0 15M+ -> item count: 119335 <- time: Sun Nov 19 15:27:56 2017
(Purge finished)
Res: 1591 perl5.18 0.0 15M+ -> item count: 119335 <- time: Sun Nov 19 15:27:57 2017
JOB COMPLETE.
119335 items processed.
keve@deimos:DATA$
And here is the output of the monitoring shell script:
keve@deimos:DATA$ ./monitorTWIGprocess.sh 1591
2017 Nov 19 Vas 15:20:57 CET ::> 1591 perl5.18 0.0 12M+ 0B N/A
2017 Nov 19 Vas 15:21:03 CET ::> 1591 perl5.18 0.0 12M+ 0B N/A
2017 Nov 19 Vas 15:21:09 CET ::> 1591 perl5.18 0.0 12M+ 0B N/A
2017 Nov 19 Vas 15:21:16 CET ::> 1591 perl5.18 0.0 12M+ 0B N/A
2017 Nov 19 Vas 15:21:22 CET ::> 1591 perl5.18 0.0 12M+ 0B N/A
2017 Nov 19 Vas 15:21:28 CET ::> 1591 perl5.18 0.0 12M+ 0B N/A
2017 Nov 19 Vas 15:21:34 CET ::> 1591 perl5.18 0.0 12M+ 0B N/A
2017 Nov 19 Vas 15:21:40 CET ::> 1591 perl5.18 0.0 12M+ 0B N/A
2017 Nov 19 Vas 15:21:46 CET ::> 1591 perl5.18 0.0 12M+ 0B N/A
2017 Nov 19 Vas 15:21:52 CET ::> 1591 perl5.18 0.0 12M+ 0B N/A
2017 Nov 19 Vas 15:21:58 CET ::> 1591 perl5.18 0.0 12M+ 0B N/A
2017 Nov 19 Vas 15:22:04 CET ::> 1591 perl5.18 0.0 12M+ 0B N/A
2017 Nov 19 Vas 15:22:10 CET ::> 1591 perl5.18 0.0 12M+ 0B N/A
2017 Nov 19 Vas 15:22:16 CET ::> 1591 perl5.18 0.0 13M+ 0B N/A
2017 Nov 19 Vas 15:22:22 CET ::> 1591 perl5.18 0.0 13M+ 0B N/A
2017 Nov 19 Vas 15:22:28 CET ::> 1591 perl5.18 0.0 13M+ 0B N/A
2017 Nov 19 Vas 15:22:35 CET ::> 1591 perl5.18 0.0 13M+ 0B N/A
2017 Nov 19 Vas 15:22:41 CET ::> 1591 perl5.18 0.0 13M+ 0B N/A
2017 Nov 19 Vas 15:22:47 CET ::> 1591 perl5.18 0.0 13M+ 0B N/A
2017 Nov 19 Vas 15:22:53 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:22:59 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:23:05 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:23:11 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:23:17 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:23:23 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:23:29 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:23:35 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:23:41 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:23:47 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:23:54 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:24:00 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:24:06 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:24:12 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:24:18 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:24:24 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:24:30 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:24:36 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:24:42 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:24:48 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:24:54 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:25:00 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:25:06 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:25:13 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:25:19 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:25:25 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:25:31 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:25:37 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:25:43 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:25:49 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:25:55 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:26:01 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:26:07 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:26:13 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:26:19 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:26:26 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:26:32 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:26:38 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:26:44 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:26:50 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:26:56 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:27:02 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:27:08 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:27:14 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:27:20 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:27:26 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:27:32 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:27:38 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:27:44 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:27:51 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:27:57 CET ::> 1591 perl5.18 0.0 15M+ 0B N/A
2017 Nov 19 Vas 15:28:03 CET ::>
^C
keve@deimos:DATA$
The good news is that using the twig_roots approach instead of a twig handler makes my perl code behave as I wanted it. It goes through a giant XML file end extracts data from each product (the extracted data is not in XML format) while the amount of memory used stays below 20 megabytes. Splendid!
The bad news is that I am still unable to explain why using the handler ate up that much memory, and why it did that after the last product was read. Up to the last product, memory usage was perfectly fine. What made it go out of control at that point? I will make some more tests to try and satisfy my curiosity, but that is no longer tied to the issue of how my XML-processing perl code development moves forward. I am marking the twig_roots suggestion as an accepted answer, but I very much appreciate the other answer too as that one was equaly educational to work with.
You're building a complete XML data structure except for product
elements, only to discard it all at the end
You don't say what you actually want to do with the data, but if you provide twig_handlers
then XML::Twig
is expecting to produce a filtered/modified version of the input data
If you don't want any output XML at all, or you want to build your own rather than modifying the input, then I suggest you use twig_roots
instead, like this
This is untested code. I'm travelling at present and nowhere near a PC system to test with
my $t = XML::Twig->new(
twig_roots => { product => \&do_what_needs_to_be_done_with_each_product },
twig_print_outside_roots => 0,
pretty_print => 'indented',
);
Note that do_what_needs_to_be_done_with_each_product
(I hope that's not the real identifier) shouldn't contain $t->purge
when working like this