I have been using XML::Twig for a while now on fairly small xml files with no issues. About a week ago I needed to parse a much larger xml file that was about 260MB. The file was contained within a zip archive (260MB is size of uncompressed file).
I loaded the entire file in memory (it took up about 3GB (about 50% of available) which was to be expected). I then added/modified some values and saved the file to disk using the print to file method. Once this was all done I performed a purge thinking that I would get back the memory used to parse the file. However this does not seem to be the case and I was wondering why? I am XML::Twig version 3.34 and perl version 5.10.1 on a linux machine.
My basic code structure is as follows:
my $Sheetx= $zip->contents('file1.xml');
my $tw11=new XML::Twig();
my $Sheetx_parse = $tw11->parse($Sheetx);
my $fh1PB_filename='file2.xml';
open(my $fh1PB, '>:encoding(UTF-8)', $fh1PB_filename) or die "Could not open file " . $fh1PB_filename . " $!";
$tw11->print($fh1PB);
close($fh1PB);
$tw11->purge();
my $member1 = $zip->removeMember('file1.xml');
my $member1A = $zip->addFile($fh1PB_filename,'file1.xml','8');
Any help much appreciated.
P.S. I know I could use twig handlers to reduce memory usage but would like to know why the purge idea does not seem to work when parsing the entire file.
The basic reason is that Perl never releases the memory it uses to the system.
The purge
is actually useless. You just have to let the twig go out of scope, and the memory will be released... for Perl to use again. So if you parsed several files, the memory used would be the amount used by the biggest file.
purge
is used in handlers, so the memory from a part of the XML tree is released (to Perl, not the system) and reused for the next part. So the memory used is the amount used by he biggest part kept in memory at once.