bashunixterminalbioinformaticsqiime

remove the first 15 characters from every other line in a file


I have some txt files that look like this (they contain DNA sequences and sample codes):

>SRR1502445.1
GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCAACACAGGGGGATAGGNNNNNNNNNNNNNNNNNNNNNNN
>SRR1502445.2
GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNN
>SRR1502445.3
GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNN
>SRR1502445.4
GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNNNNNNNNNN

I would like to remove the first 15 characters of every other line in the file. This would remove the string GACTACACGTAGTAT from the second, fourth, sixth, eighth lines (etc).

For instance the cut command can remove the first 15characters of every line:

cut -c 1-15 /path/to/file.txt

I'd like to apply to to only every other line, starting with the second.


Solution

  • If you don't mind using sedand assuming other line starts with > then the following will remove the first 15 contiguous uppercase characters "A-Z" of the other lines:

    sed 's/^[A-Z]\{15\}//' file > new_file

    Or, in place edit (GNU sed) use -i:

    sed -i 's/^[A-Z]\{15\}//' file

    Or, in place edit (BSD sed) use -i '':

    sed -i '' 's/^[A-Z]\{15\}//' file

    Or, back it up:

    sed -i.bak 's/^[A-Z]\{15\}//' file

    Example:

    $ cat file
    >SRR1502445.1
    GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCAACACAGGGGGATAGGNNNNNNNNNNNNNNNNNNNNNNN
    >SRR1502445.2
    GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNN
    >SRR1502445.3
    GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNN
    >SRR1502445.4
    GACTACACGTAGTATACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNNNNNNNNNN
    $ sed 's/^[A-Z]\{15\}//' file
    >SRR1502445.1
    ACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCAACACAGGGGGATAGGNNNNNNNNNNNNNNNNNNNNNNN
    >SRR1502445.2
    ACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNN
    >SRR1502445.3
    ACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNN
    >SRR1502445.4
    ACGAGTGCGTTCCTGCGCTTATTGATATGCTTAAGTTCAGCGGGTAGTCTCACCCGATTTGAGGTCAAGGTTTGTGGGTCGAGTCACAACTCGAACATCGTCTTTGTACAAAGACGGTTGGAAGCGGGTTCCAAGGCACACAGGGGATAGGNNNNNNNNNNN
    $