Here is my solution to the problem of rosalind project.
def prot(rna):
for i in xrange(3, (5*len(rna))//4+1, 4):
rna=rna[:i]+','+rna[i:]
rnaList=rna.split(',')
bases=['U','C','A','G']
codons = [a+b+c for a in bases for b in bases for c in bases]
amino_acids = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'
codon_table = dict(zip(codons, amino_acids))
peptide=[]
for i in range (len (rnaList)):
if codon_table[rnaList[i]]=='*':
break
peptide+=[codon_table[rnaList[i]]]
output=''
for i in peptide:
output+=str(i)
return output
If I run prot('AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA')
, I get the correct output 'MAMAPRTEINSTRING'
. However if the sequence of rna (the input string) is hundreds of nucleotides (characters) long I got an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 11, in prot
KeyError: 'CUGGAAACGCAGCCGACAUUCGCUGAAGUGUAG'
Can you point me where I went wrong?
Given that you have a KeyError
, the problem must be in one of your attempts to access codon_table[rnaList[i]]
. You are assuming each item in rnalist
is three characters, but evidently, at some point, that stops being True
and one of the items is 'CUGGAAACGCAGCCGACAUUCGCUGAAGUGUAG'
.
This happens because when you reassign rna = rna[:i]+','+rna[i:]
you change the length of rna
, such that your indices i
no longer reach the end of the list. This means that for any rna
where len(rna) > 60
, the last item in the list will not have length 3. If there is a stop codon before you reach the item it isn't a problem, but if you reach it you get the KeyError
.
I suggest you rewrite the start of your function, e.g. using the grouper
recipe from itertools
:
from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
def prot(rna):
rnaList = ["".join(t) for t in grouper(rna, 3)]
...
Note also that you can use
peptide.append(codon_table[rnaList[i]])
and
return "".join(peptide)
to simplify your code.