perl

Parsing a file by summing up different columns of each row separated by blank line


I have a file input as below;

#
volume stats
start_time  1
length      2
--------
ID
0x00a,1,2,3,4
0x00b,11,12,13,14
0x00c,21,22,23,24

volume stats
start_time  2
length      2
--------
ID
0x00a,31,32,33,34
0x00b,41,42,43,44
0x00c,51,52,53,54

volume stats
start_time  3
length      2
--------
ID
0x00a,61,62,63,64
0x00b,71,72,73,74
0x00c,81,82,83,84
#

I need output in below format;

1 33    36  39  42
2 123   126 129 132
3 213   216 219 222
#

Below is my code;

#!/usr/bin/perl
use strict;
use warnings;
#use File::Find;

# Define file names and its location
my $input = $ARGV[0];

# Grab the vols stats for different intervals
open (INFILE,"$input") or die "Could not open sample.txt: $!";
my $date_time;
my $length;
my $col_1;
my $col_2;
my $col_3;
my $col_4;
foreach my $line (<INFILE>)
{

    if ($line =~ m/start/)
        {
            my @date_fields = split(/   /,$line);
            $date_time = $date_fields[1];
        }
    if ($line =~ m/length/i)
        {
            my @length_fields = split(/ /,$line);
            $length = $length_fields[1];
        }
    if ($line =~ m/0[xX][0-9a-fA-F]+/)
        {
            my @volume_fields = split(/,/,$line);
            $col_1 += $volume_fields[1];
            $col_2 += $volume_fields[2];
            $col_3 += $volume_fields[3];
            $col_4 += $volume_fields[4];
            #print "$col_1\n";
        }
    if ($line =~ /^$/)
        {
            print "$date_time $col_1 $col_2 $col_3 $col_4\n";
                $col_1=0;$col_2=0;$col_3=0;$col_4=0;
        }
}
close (INFILE);
#

my code result is;

1
 33 36 39 42
2
 123 126 129 132
#

BAsically, for each time interval, it just sums up the columns for all the lines and displays all the columns against each time interval.


Solution

  • $/ is your friend here. Try setting it to '' to enable paragraph mode (separating your data by blank lines).

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    
    local $/ = ''; 
    
    while ( <> ) {
        my ( $start ) = m/start_time\s+(\d+)/;
        my ( $length ) = m/length\s+(\d+)/;
        my @row_sum; 
        for ( m/(0x.*)/g )  {
            my ( $key, @values ) = split /,/; 
            for my $index ( 0..$#values ) {
               $row_sum[$index] += $values[$index];
            }
        }
        print join ( "\t", $start, @row_sum ), "\n";
    }
    

    Output:

    1       33      36      39      42
    2       123     126     129     132
    3       213     216     219     222
    

    NB - using tab stops for output. Can use sprintf if you need more flexible options.

    I would also suggest that instead of:

    my $input = $ARGV[0]; 
    open (my $input_fh, '<', $input) or die "Could not open $input: $!";
    

    You would be better off with:

    while ( <> ) { 
    

    Because <> is the magic filehandle in perl, that - opens files specified on command line, and reads them one at a time, and if there isn't one, reads STDIN. This is just like how grep/sed/awk do it.

    So you can still run this with scriptname.pl sample.txt or you can do curl http://somewebserver/sample.txt | scriptname.pl or scriptname.pl sample.txt anothersample.txt moresample.txt

    Also - if you want to open the file yourself, you're better off using lexical vars and 3 arg open:

    open ( my $input_fh, '<', $ARGV[0] ) or die $!; 
    

    And you really shouldn't ever be using 'numbered' variables like $col_1 etc. If there's numbers, then an array is almost always better.