arraysperlhashmaxminimum-size

Hash key as array to obtain the minimum and maximum numbers if a columns values are equal


I have the next data:

miRNA17 70      105     dvex699824      12      233
miRNA17 21      60      dvex699824      42      20
miRNA17 55      89      dvex699824      6       40
miRNA18 58      85      dvex701176      119     92
miRNA17 66      105     dvex703815      35      75
miRNA17 31      71      dvex703815      43      83
miRNA17 39      79      dvex703815      43      83
miRNA2  28      56      dvex731981      313     286
miRNA17 10      70      dvex735428      142     203
miRNA17 29      91      dvex735428      213     152
miRNA17 66      105     dvex735668      163     125

The question is: If I have this 6 columns, I need to group and print in accordance to this rules:

the same miRNA## \t regardless \t regardless \t The same dvex#### \t Take the Lower \t Take the highest

For example this is the possible output:

miRNA17 21     105   dvex699824   6    233
miRNA18 58     85    dvex701176   119  92
miRNA17 31     105   dvex703815   35   83
miRNA2  28     56    dvex731981   313  286
miRNA17 10     105   dvex735428   142  203

What is the possible way to resolve this problem via Hashes-keys as arrays?


Solution

  • Perl script:

    use strict;
    
    # Not shown... Parse the data file, stuff into an array of arrays.
    
    my @data = (
        [ 'miRNA17', 70, 105, 'dvex699824',  12, 233 ],
        [ 'miRNA17', 21,  60, 'dvex699824',  42,  20 ],
        [ 'miRNA17', 55,  89, 'dvex699824',   6,  40 ],
        [ 'miRNA18', 58,  85, 'dvex701176', 119,  92 ],
        [ 'miRNA17', 66, 105, 'dvex703815',  35,  75 ],
        [ 'miRNA17', 31,  71, 'dvex703815',  43,  83 ],
        [ 'miRNA17', 39,  79, 'dvex703815',  43,  83 ],
        [ 'miRNA2',  28,  56, 'dvex731981', 313, 286 ],
        [ 'miRNA17', 10,  70, 'dvex735428', 142, 203 ],
        [ 'miRNA17', 29,  91, 'dvex735428', 213, 152 ],
        [ 'miRNA17', 66, 105, 'dvex735668', 163, 125 ]
    );
    
    my %results;
    
    foreach my $record (@data) {
        my ($mirna, $col2, $col3, $dvex, $col5, $col6) = @$record;
        $results{$mirna}{$dvex}{col2} = $col2; # don't care.
        $results{$mirna}{$dvex}{col3} = $col3; # don't care.
        $results{$mirna}{$dvex}{col5} = $col5
            if not $results{$mirna}{$dvex}{col5} or $results{$mirna}{$dvex}{col5} > $col5;
        $results{$mirna}{$dvex}{col6} = $col6
            if not $results{$mirna}{$dvex}{col6} or $results{$mirna}{$dvex}{col6} < $col6;      
    }
    
    
    foreach my $mirna (keys %results) {
        foreach my $dvex (sort keys %{$results{$mirna}}) {
            printf "%-8s  %5d  %5d  %-10s  %3d %3d\n",
                $mirna, $results{$mirna}{$dvex}{col2}, $results{$mirna}{$dvex}{col3},
                $dvex, $results{$mirna}{$dvex}{col5}, $results{$mirna}{$dvex}{col6};
        }
    }
    
    1;
    

    Output:

    miRNA2       28     56  dvex731981  313 286
    miRNA17      55     89  dvex699824    6 233
    miRNA17      39     79  dvex703815   35  83
    miRNA17      29     91  dvex735428  142 203
    miRNA17      66    105  dvex735668  163 125
    miRNA18      58     85  dvex701176  119  92