perlvariablesimplicitchomp

How do I use correctly chomp command to get rid of \n character in perl?


my question is very simple: i have a database that is looking like this: enter image description here

My goal is just to eliminate the newline \n at the end of every sequence line, NOT OF THE HEADER, i tried the following code

#!/usr/bin/perl
use strict;
my $db = shift;
my $outfile= "Silva_chomped_for_R_fin.fasta";
my $header;
my $seq;
my $kick = ">";

open(FASTAFILE, $db);
open(OUTFILE,">". $outfile);

while(<FASTAFILE>) {
    my $currentline = $_;
    chomp $currentline;
    if ($currentline =~ m/^$kick/) {
        $header = $currentline;
    } else {
        chomp $currentline;
        $seq = $currentline;
    }
    my $path = $header.$seq."\n";
    print(OUTFILE $path);
}

close OUTFILE;
close FASTAFILE;
exit;

But instead of having just the sequence line chomped i obtain the followingenter image description here

like if chomp didn't work at all.. any idea of what i do wrong? thanks a lot Alfredo


Solution

  • There are three issues with your while() loop.

    Here is a simplified version.

    use strict;
    use warnings;
    
    my $db = shift;
    my $outfile = "out.fasta";
    
    open(my $fh, "<", $db) or die "Could not open input file";
    open(my $out, ">", $outfile) or die "Could not open output file";
    
    my $header;
    
    while (<$fh>) {
        $header = /^>/;
        chomp unless $header;
        print $out $. > 1 && $header && "\n", $_;
    }
    
    close $out;
    close $fh;
    

    The line

    print $out $. > 1 && $header && "\n", $_;
    

    will conditionally prepend a newline to the output if this line begins with a '>' - unless it is the first line in the file. (The $. variable is the current linenumber.)

    Credit: ikegami spotted the failure in my original code to allow for more than one sequence within the input database.