language-agnostic

How to concatenate every four lines for the first four lines of a file


I'm not sure the best wording to ask this question, but I'm trying to concatenate the first four lines to the next four lines and so on until the end of the file.

My data looks like:

aggaacgtgagttgaaaattgaagcgacaaacttggtttcatgtcctgtttgtggaaaga
catctattgttagagacaatatattgtctgatctgacttatctgcatgttc---------
 .     **    ..* * *. * .* * .*..**..**  .  * ****.         

gcataaaaggaatggacacaatcataaatgaacatcttgatatctgccttacaagaaggt
----------tgtggattcctttctttttccttttggagatatctgccttacaagaaggt
           .****. *  *. *   *   . *   **********************

ccaaacgaaaacttacccaacgcacactacttcagtttggtgttggatcaagtaccaaaa
ccaaacgaaaacttacccaacgcacactacttcagtttggtgttggatcaagtaccaaaa
************************************************************

And I'm trying to merge/concatenate every four lines to the four lines before to create a horizontal file format that looks like:

aggaacgtgagttgaaaattgaagcgacaaacttggtttcatgtcctgtttgtggaaagagcataaaaggaatggacacaatcataaatgaacatcttgatatctgccttacaagaaggtccaaacgaaaacttacccaacgcacactacttcagtttggtgttggatcaagtaccaaaa
catctattgttagagacaatatattgtctgatctgacttatctgcatgttc-------------------tgtggattcctttctttttccttttggagatatctgccttacaagaaggtccaaacgaaaacttacccaacgcacactacttcagtttggtgttggatcaagtaccaaaa
 .     **    ..* * *. * .* * .*..**..**  .  * ****.                    .****. *  *. *   *   . *   **********************************************************************************

I know I can use paste - - to delete a newline character every other line, but what would be the simplest route to paste together the different lines of my file for the first four lines every other four lines?


Solution

  • You could use :

    #!/bin/perl
    
    use strict;
    use warnings;
    
    my %lines;                   # hash container to store the lines
    
    while(<>) {                  # read lines from stdin
        chomp;                   # remove newline
        my $idx = ($. - 1) % 4;  # calculate index of line [0,4)
        $lines{$idx} .= $_;      # concatename the current line to what's at $idx
    }
    
    # Done, print the result:
    for(my $i = 0; $i < 4; ++$i) {
        print $lines{$i} ."\n";
    }