I am using XML::Twig to parse the file in my perl script. I am bit new to this. I got following kind of entries (sample example here) in my XML file:
<?xml version="1.0" encoding="UTF-8"?>
<mytag1 name="abc">
<mytag2>This is line 1.
This is line 2.
This is line 3.
</mytag2>
</mytag1>
And in my perl script, I am doing something like:
my $twig = XML::Twig->new( keep_encoding=>1, keep_atts_order=>1, pretty_print => 'indented', comments => 'keep' );
$twig->parsefile($in_file);
I have some validation code around after which following kind of output is getting generated.
<?xml version="1.0" encoding="UTF-8"?>
<mytag1 name="abc">
<mytag2>This is line 1.
This is line 2.
This is line 3.
</mytag2>
</mytag1>
The extra blank lines are getting generated in output, I am not sure what's going wrong. I tried to search around but couldn't find much useful information on this. Any help will be appreciated.
Remove the keep_encoding
option. It's useless since the input is in utf-8, and it makes the module bypass some of the parser features, notably the one that normalizes LF/CR
It should not be used anyway: it's a relic of a time when Unicode was not as prevalent as today. It allowed people stuck with old encodings to still be able to process their XML.
Thanks ikegami!