I have a multiple sequence alignment (MSA) file derived from mafft in clustal format which I want to import into Python and save into a PDF file. I need to import the file and then highlight some specific words. I've tried to simply import the pdf of the MSA but after the highlight command doesn't work.
I need to print the file like this:
CLUSTAL format alignment by MAFFT FFT-NS-i (v7.453)
Consensus --------------------------------------acgttttcgatatttatgccat
AMP tttatattttctcctttttatgatggaacaagtctgcgacgttttcgatatttatgccat
**********************
Consensus atgtgcatgttgtaaggttgaaagcaaaaatgaggggaaaaaaaatgaggtttttaataa
AMP atgtgcatgttgtaaggttgaaagcaaaaatgaggggaaaaaaaatgaggtttttaataa
************************************************************
Consensus ctacacatttagaggtctaggaaataaaggagtattaccatggaaatgtatttccctaga
AMP ctacacatttagaggtctaggaaataaaggagtattaccatggaaatgtaattccctaga
************************************************** *********
Consensus tatgaaatattttcgtgcagttacaacatatgtgaatgaatcaaaatatgaaaaattgaa
AMP tatgaaatattttcgtgcagttacaacatatgtgaatgaatcaaaatatgaaaaattgaa
************************************************************
Consensus atataagagatgtaaatatttaaacaaagaaactgtggataatgtaaatgatatgcctaa
AMP atataagagatgtaaatatttaaacaaagaaactgtggataatgtaaatgatatgcctaa
************************************************************
Consensus ttctaaaaaattacaaaatgttgtagttatgggaagaacaaactgggaaagcattccaaa
AMP ttctaaaaaattacaaaatgttgtagttatgggaagaacaaactgggaaagcattccaaa
************************************************************
Consensus aaaatttaaacctttaagcaataggataaatgttatattgtctagaaccttaaaaaaaga
AMP aaaatttaaacctttaagcaataggataaatgttatattgtctagaaccttaaaaaaaga
************************************************************
Consensus agattttgatgaagatgtttatatcattaacaaagttgaagatctaatagttttacttgg
AMP agattttgatgaagatgtttatatcattaacaaagttgaagatctaatagttttacttgg
************************************************************
Consensus gaaattaaattactataaatgttttattataggaggttccgttgtttatcaagaattttt
AMP gaaattaaattactataaatgttttattataggaggttccgttgtttatcaagaattttt
************************************************************
Consensus agaaaagaaattaataaaaaaaatatattttactagaataaatagtacatatgaatgtga
AMP agaaaagaaattaataaaaaaaatatattttactagaataaatagtacatatgaatgtga
************************************************************
Consensus tgtattttttccagaaataaatgaaaatgagtatcaaattatttctgttagcgatgtata
AMP tgtattttttccagaaataaatgaaaatgagtatcaaattatttctgttagcgatgtata
************************************************************
Consensus tactagtaacaatacaacattgga----------------------------------
AMP tactagtaacaatacaacattggattttatcatttataagaaaacgaataataaaatg
************************
How can I import the alignment and print in the new PDF with the right alignment of the sequences.
Thanks
Ok, figured out a way, not sure its the best one,
nedd to install fpdf2
(pip install fpdf2
)
from io import StringIO
from Bio import AlignIO # Biopython 1.80
from fpdf import FPDF # pip install fpdf2
alignment = AlignIO.read("Multi.txt", "clustal")
stri = StringIO()
AlignIO.write(alignment, stri, 'clustal' )
# print(stri.getvalue())
stri_lines = [ i for i in stri.getvalue().split('\n')]
# print(stri_lines)
pdf = FPDF(orientation="P", unit="mm", format="A4")
# Add a page
pdf.add_page()
pdf.add_font('FreeMono', '', 'FreeMono.ttf')
pdf.set_font("FreeMono", size = 8)
for x in stri_lines:
pdf.cell(0, 5, txt = x, border = 0, new_x="LMARGIN" , new_y="NEXT", align = 'L', fill = False)
# print(len(x))
pdf.output("out.pdf")
output pdf out.pdf
:
Not sure why the file Header is changed, think is something within Biopythion (!!! ???) you can check adding:
with open('file_output.txt', 'w') as filehandler:
AlignIO.write(alignment, filehandler, 'clustal')
I had to place a tff FreeMono font (Mono spaced font in my script directory) see: pdf.add_font('FreeMono', '', 'FreeMono.ttf')
otherwise the alignement won't be printed in the correct way
Which fonts have the same width for every character?.
Attached a png of my pdf. See that you can highlight it,
using:
pdf.set_fill_color(255, 255, 0)
filling = False
for x in stri_lines:
if 'Consensus' in x:
filling = True
else:
filling = False
pdf.cell(0, 5, txt = x, border = 0, new_x="LMARGIN" , new_y="NEXT", align = 'L', fill = filling)
or something similar you can highlight while printing: