I have a file which contains about 70,000 records which is structured roughly like this:
01499 1000642 4520101000900000
...more numbers...
104000900169
+Fieldname1
-Content
+Fieldname2
-Content
-Content
-Content
+Fieldname3
-Content
-Content
+Fieldname4
-Content
+Fieldname5
-Content
-Content
-Content
-Content
-Content
-Content
01473 1000642 4520101000900000
...more numbers...
Every record thus starts with a column of numbers and ends with a blank line. Before this blank line most records have a +Fieldname5
and one or more -Content
lines.**
What I would like to do is to merge all multi-line entries into one line while replacing the leading minus-character by a space except those pertaining to the last field (i.e. Fieldname5 in this case).
It should look like this:
01499 1000642 4520101000900000
...more numbers...
104000900169
+Fieldname1
-Content
+Fieldname2
-Content Content Content
+Fieldname3
-Content Content
+Fieldname4
-Content
+Fieldname5
-Content
-Content
-Content
-Content
-Content
-Content
01473 1000642 4520101000900000
...more numbers...
What I have now is this (adapted from this answer):
use strict;
use warnings;
our $input = "export.txt";
our $output = "export2.txt";
open our $in, "<$input" or die "$!\n";
open our $out, ">$output" or die "$!\n";
my $this_line = "";
my $new = "";
while(<$in>) {
my $last_line = $this_line;
$this_line = $_;
# If both $last_line and $this_line start with a "-" do the following:
if ($last_line =~ /^-.+/ && $this_line =~ /^-.+/) {
# Remove \n from $last_line
chomp $last_line;
# Remove leading "-" from $this_line
$this_line =~ s/^-//;
# Join both lines and print them to the file
$new = join(' ', $last_line, $this_line);
print $out $new;
} else {
print $out $last_line;
}
}
close ($in);
close ($out);
But there are two problems with this:
It correctly prints out the joined line, but then still prints out the second line, e.g.,
+Fieldname2 -Content Content Content -Content
So how can I make the script only output the joined line?
How can I do the following?
\n-
by
, except if it belongs to a given fieldname (e.g., Fieldname5
).It worked! I just added another conditional at the beginning:
use strict;
use warnings;
our $input = "export.txt";
our $output = "export2.txt";
open our $in, "<$input" or die "Kann '$input' nicht finden: $!\n";
open our $out, ">$output" or die "Kann '$output' nicht erstellen: $!\n";
my $insideMultiline = 0;
my $multilineBuffer = "";
my $exception = 0; # Variable indicating whether the current
# multiline-block is a "special" or not
LINE:
while (<$in>) {
if (/^\+Fieldname5/) { # If line starts with +Fieldname5,
# set $exception to "1"
$exception = 1;
}
elsif (/^\s/) { # If line starts with a space,
# set $exception to "0"
$exception = "0";
}
if ($exception == 0 && /^-/) { # If $exception is "0" AND
# the line starts with "-",
# do the following
chomp;
if ($insideMultiline) {
s/^-/ /;
$multilineBuffer .= $_;
}
else {
$insideMultiline = 1;
$multilineBuffer = $_;
}
next LINE;
}
else {
if ($insideMultiline) {
print $out "$multilineBuffer\n";
$insideMultiline = 0;
$multilineBuffer = "";
}
print $out $_;
}
}
close ($in);
close ($out);
Assuming the only lines which begin with "-" are these multi-line sections, you could do this...
# Open $in and $out as in your original code...
my $insideMultiline = 0;
my $multilineBuffer = "";
LINE:
while (<$in>) {
if (/^-/) {
chomp;
if ($insideMultiline) {
s/^-/ /;
$multilineBuffer .= $_;
}
else {
$insideMultiline = 1;
$multilineBuffer = $_;
}
next LINE;
}
else {
if ($insideMultiline) {
print $out "$multilineBuffer\n";
$insideMultiline = 0;
$multilineBuffer = "";
}
print $out $_;
}
}
As to the embedded subquestion ("except those pertaining to the last field"), I'd need more detail on the file format to be able to do that. It looks like a blank line separates the sets of fields and contents from one another, but that's not 100% clear in the description. The code above should handle the requirements you laid out at the bottom, though.