I am running ipcress
for in silico PCR and the results look like this:
Ipcress result
Experiment: Primer1
Primers: B A
Target: QLOD02000001.1:filter(unmasked), whole genome shotgun sequence
Matches: 20/20 20/20
Product: 2601 bp (range 100-5000)
Result type: revcomp
ipcress: QLOD02000001.1:filter(unmasked) Primer1 2601 B 91258 0 A 93839 0 revcomp
>F-RK1_product_1 seq QLOD02000001.1:filter(unmasked) start 91258 length 2601
AAGCGGATTGAGAAGTGGTGGTGGTAGTAGCAGTCATGTGGGTAACGAAGACTACAACAGCAGTATTATA
ATTAGGAAAAGGTTTGAAGAAAAGATGAGGCTTGAAAGGGACGACGACGACGACAAGATCTTCAATCCCA
CCAAGTACTTTGTCCAAGAAGTTGTTAATTGCTTTGATGAGTCTGACCTCTACAGAACT...
Ipcress result
Experiment: Primer2
Primers: B A
Target: QLOD02000001.1:filter(unmasked), whole genome shotgun sequence
Matches: 20/20 20/20
Product: 854 bp (range 100-5000)
Result type: revcomp
ipcress: QLOD02000001.1:filter(unmasked) Primer2 854 B 149835 0 A 150669 0 revcomp
>F-RK3_product_1 seq QLOD02000001.1:filter(unmasked) start 149835 length 854
AGGATGACATGGGAATCTGGGACCTCAACCATTTTGTCTAGCTCTCTCCCAAGAGAAAGCGACGAAAATG
ACATGGGTTTGGCTCTGTATTGTTTAACAAATTTAAGTGGCTTAAAAACTCTAC....
I would like to know if there is any way to linearize these fasta sequences (and only that)? I would like my final file to look like this:
Ipcress result
Experiment: Primer1
Primers: B A
Target: QLOD02000001.1:filter(unmasked), whole genome shotgun sequence
Matches: 20/20 20/20
Product: 2601 bp (range 100-5000)
Result type: revcomp
ipcress: QLOD02000001.1:filter(unmasked) Primer1 2601 B 91258 0 A 93839 0 revcomp
>F-RK1_product_1 seq QLOD02000001.1:filter(unmasked) start 91258 length 2601
AAGCGGATTGAGAAGTGGTGGTGGTAGTAGCAGTCATGTGGGTAACGAAGACTACAACAGCAGTATTATAATTAGGAAAAGGTTTGAAGAAAAGATGAGGCTTGAAAGGGACGACGACGACGACAAGATCTTCAATCCCACCAAGTACTTTGTCCAAGAAGTTGTTAATTGCTTTGATGAGTCTGACCTCTACAGAACT...
Ipcress result
Experiment: Primer2
Primers: B A
Target: QLOD02000001.1:filter(unmasked), whole genome shotgun sequence
Matches: 20/20 20/20
Product: 854 bp (range 100-5000)
Result type: revcomp
ipcress: QLOD02000001.1:filter(unmasked) Primer2 854 B 149835 0 A 150669 0 revcomp
>F-RK3_product_1 seq QLOD02000001.1:filter(unmasked) start 149835 length 854
AGGATGACATGGGAATCTGGGACCTCAACCATTTTGTCTAGCTCTCTCCCAAGAGAAAGCGACGAAAATGACATGGGTTTGGCTCTGTATTGTTTAACAAATTTAAGTGGCTTAAAAACTCTAC....
If you are asking how to unwrap lines between a line which starts with >
(a FASTA header) and an empty line, that is quite easy:
awk '/^>/ { wrap=1; print; next }
wrap && /^$/ { print wrapped; wrapped = ""; wrap = 0 }
wrap { wrapped = wrapped $0; next }
1
END { if (wrap) print wrapped }' file >newfile
Recall that Awk examines one line at a time. If we see the FASTA header, we set wrap
to 1 so we can remember this fact, print the current line, and skip to the next line. Now, on subsequent lines, if we see an empty line, we print whatever we have collected (which is handled in the next line of the script), and stop collecting. Otherwise, if we get this far in the script and wrap
is true, collect the current line to the end of wrapped
and skip to the next input line. Otherwise, anything not covered by the previous cases is simply printed. (The Awk idiom 1
is a shorthand which does this.) Finally, if we have something in wrapped
when we finish, don't forget to print that too.