[SOLVED] How to merge records in csv file based on first field?

How to merge records in csv file based on first field?

I have a csv file,

 id1,v1,v2,v3,v4
 id2,v1,v2,v6,v4
 id1,v7,v8,v3,v9
 id1,v10,v11,v12,v13
 id2,v3,v5,v8,v7

since, the file is not sorted, and should not be! I want output as:

 id1,v1|v7|v10,v2|v8|v11,v3|v12,v4|v9|v13
 id2,v1|v10,v2|v5,v6|v8,v4|v7

Where, all respective values in columns are merged to respective column in record with same id, except repeated value (see v3 in 3rd column for id1) and id.

I tried it using code given here http://www.robelle.com/tips/st-export-notes.html. But it needs much more than that.

How this can be achieved using perl? I am new to perl. Thanks in advance!

Solution

Assuming you don't need any particular sort order, you can use a hash of arrays to solve this. Hashes are known as dictionaries in other languages.

use strict;
use warnings;

my %data;

while ( <DATA> ) {
  my ($id, @vals) = /[^,\s]+/g;
  for my $i ( 0 .. $#vals ) {
    ++$data{$id}[$i]{$vals[$i]};
  }
}

while ( my ($id, $vals) = each %data ) {
  my @vals = map { join '|', keys %$_ } @$vals;
  printf "%s,%s\n", $id, join ',', @vals;
}

__DATA__
id1,v1,v2,v3,v4
id2,v1,v2,v6,v4
id1,v7,v8,v3,v9
id1,v10,v11,v12,v13
id2,v3,v5,v8,v7

output

id2,v1|v3,v5|v2,v8|v6,v7|v4
id1,v7|v10|v1,v11|v2|v8,v12|v3,v4|v13|v9