I have data in this format
a1 1901 4
a1 1902 5
a3 1902 6
a4 1902 7
a4 1903 8
a5 1903 9
I want to calculate the cumulative score (3rd column) for each entity in the first column. So I tried to make a hash and my code looks like this:
use strict;
use warnings;
use Data::Dumper;
my $file = shift;
open (DATA, $file);
my %hash;
while ( my $line = <DATA> ) {
chomp $line;
my ($protein, $year, $score) = split /\s+/, $line;
push @{ $hash{$protein}{$year} }, $score;
}
print Dumper \%hash;
close DATA:
The output looks like this
$VAR1 = {
'a3' => {
'1902' => [
5
]
},
'a1' => {
'1902' => [
6
],
'1901' => [
4
]
},
'a4' => {
'1903' => [
8
],
'1902' => [
7
]
},
'a5' => {
'1903' => [
9
]
}
};
I now want to access each entity in column 1 (a1,a2,a3) and add the score, so the desired output will be something like this:
a1 1901 4
a1 1902 9 # 4+5
a3 1902 6
a4 1902 7
a4 1903 16 # 7+9
a5 1903 9
But I am unable to come up with how to access the values of the created hash in a loop in order to add the values?
If the data is always sorted as you show it then you can process the data as you read it from the file:
while ( <DATA> ) {
my ($protein, $year, $score) = split;
$total = 0 unless $protein eq $current;
$total += $score;
print "$protein $year $total\n";
$current = $protein;
}
a1 1901 4
a1 1902 9
a3 1902 6
a4 1902 7
a4 1903 15
a5 1903 9