I am trying to merge two files by Perl.
Codes so far:
my $hash_ref;
open (my $I_fh, "<", "File1.txt") or die $!;
my $line = <$I_fh>;
while ($line = <$I_fh>) {
chomp $line;
my @cols = split ("\t", $line);
my $key = $cols[1];
$hash_ref -> {$key} = \@cols;
}
close $I_fh;
open (my $O_fh, "<", "File2.txt") or die $!;
while ($line = <$O_fh>) {
chomp $line;
my @cols = split ("\t", $line);
my $key = shift (@cols);
push (@{$hash_ref -> {$key}}, @cols);
}
close $O_fh;
open (my $out, ">", "merged.txt") or die $!;
foreach my $key (sort keys %$hash_ref) {
my $row = join ("\t", @{$hash_ref -> {$key}});
print $out "$key\t$row\n";
}
close $out;
I am using print or Dumper function to check every steps. In the terminal windows, everything is fine. However, in my output file (merged txt), the format was changed. I would like to merge two files by adding more columns, not adding more rows. How can I fix codes?
File 1.txt:
Index Name Column1 Column2
1 A1 AB
2 A2 CD
3 B1 EF
4 B2 GH
File 2.txt:
Name Type
A1 1
A2 1
B1 2
B2 1
Merged file:
A1 1 AB
1
A2 2 CD
1
B1 3 EF
2
B2 4 GH
1
Wanted file:
Name Type Column2
A1 1 AB
A2 1 CD
B1 2 EF
B2 1 GH
Assuming the files are sorted based on the name column, this is really easy to do thanks to the join(1) program:
$ join --header -t $'\t' -o 2.1,2.2,1.4 -1 2 -2 1 file1.tsv file2.tsv
Name Type Column2
A1 1 AB
A2 1 CD
B1 2 EF
B2 1 GH
The --header
option is a GNU extension that excludes the first lines of the two files from being joined and treats them as column titles instead. -t
sets the column separator, -o
controls what columns are included in the output (A list of FILE.COLUMN specifiers), and -1
and -2
choose the columns that are used to join the two files.
If they're not sorted, or if you're set on perl, your code looks very very close; besides all the typos and such, you're printing out every column, not just the ones your desired output suggest you care about. Consider:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw/say/;
use autodie;
my %names;
sub read_file {
my ($file, $idx) = @_;
open my $in, "<", $file;
my $header = <$in>;
while (<$in>) {
chomp;
my @F = split /\t/;
push @{$names{$F[$idx]}}, \@F;
}
}
read_file "file1.tsv", 1;
read_file "file2.tsv", 0;
say "Name\tType\tColumn2";
for my $n (sort keys %names) {
my $row = $names{$n};
say "$n\t$row->[1][1]\t$row->[0][3]";
}
I also suspect your strange output might be explained by running your program on data files that use Windows-style line endings when your OS uses Unix-style line endings.