linuxperlcsvmergeperlscript

How to merge records in csv file based on first field?


I have a csv file,

 id1,v1,v2,v3,v4
 id2,v1,v2,v6,v4
 id1,v7,v8,v3,v9
 id1,v10,v11,v12,v13
 id2,v3,v5,v8,v7

since, the file is not sorted, and should not be! I want output as:

 id1,v1|v7|v10,v2|v8|v11,v3|v12,v4|v9|v13
 id2,v1|v10,v2|v5,v6|v8,v4|v7

Where, all respective values in columns are merged to respective column in record with same id, except repeated value (see v3 in 3rd column for id1) and id.

I tried it using code given here http://www.robelle.com/tips/st-export-notes.html. But it needs much more than that.

How this can be achieved using perl? I am new to perl. Thanks in advance!


Solution

  • Assuming you don't need any particular sort order, you can use a hash of arrays to solve this. Hashes are known as dictionaries in other languages.

    use strict;
    use warnings;
    
    my %data;
    
    while ( <DATA> ) {
      my ($id, @vals) = /[^,\s]+/g;
      for my $i ( 0 .. $#vals ) {
        ++$data{$id}[$i]{$vals[$i]};
      }
    }
    
    while ( my ($id, $vals) = each %data ) {
      my @vals = map { join '|', keys %$_ } @$vals;
      printf "%s,%s\n", $id, join ',', @vals;
    }
    
    __DATA__
    id1,v1,v2,v3,v4
    id2,v1,v2,v6,v4
    id1,v7,v8,v3,v9
    id1,v10,v11,v12,v13
    id2,v3,v5,v8,v7
    

    output

    id2,v1|v3,v5|v2,v8|v6,v7|v4
    id1,v7|v10|v1,v11|v2|v8,v12|v3,v4|v13|v9