pythonreversefastadna-sequencecomplement

I am trying to reverse complement a fasta DNA sequence


I have been trying to reverse complement a fasta DNA sequence. Here is my code:

fastafile=open('sequence (3).fasta','r')
entries=[]
reverse=""
sequence=['A','T','G','C','N']
for line in fastafile:
    if not line.startswith('>'):
        line = line.split()
        entries.append(line)
print entries
for index in range(0,len(entries[::-1])):
    if index !=sequence:
        print "this is not a valid nucleotide"
        break
    else:
        if index=='A':
            reverse+='T'
        elif index=='T':
            reverse+='A'
        elif index=='C':
            reverse+='G'
        elif index=='G':
            reverse+ 'C'
        elif index=='N':
            reverse+='N'
print reverse

And each time I get the output, this is not a valid nucleotide, even when my print of entries show that it has the items in sequence. Here is a sample of the output when I print enteries;

[['GCTCCCCTGAGGTTCGGCACCCACACTCCCTTCCCAGGAGCTCGCGATGCAAGAGCCACAGTCAGAGCTC'], ['AATATCGACCCCCCTCTGAGCCAGGAGACATTTTCAGAATTGTGGAACCTGCTTCCTGAAAACAATGTTC'], ['TGTCTTCGGAGCTGTGCCCAGCAGTGGATGAGCTGCTGCTCCCAGAGAGCGTCGTGAACTGGCTAGACGA']

How can I override this problem? I just want to add that I only started seriously programming with python about 2 months ago so I am still learning and improving. Thank you!


Solution

  • Your loop statement is:

    for index in range(0,len(entries[::-1])):
    

    This will iterate over the length of entries, that is, 0, 1, 2, 3, ..., len(entries).

    When you do if index != sequence you are actually comparing an integer with a list, say if 3 != ['A', 'C', 'T', 'G']. I assume you can see that makes no sense. What you probably want to do is see if the nucleotide in the sequence is a valid nucleotide, ergo it is in the sequence list. You can do it like this:

    if entries[::-1][index] in sequence # Will be true if the nucleotide at entries[::-1][index] is inside sequence
    

    Let me say two things:

    -

    from Bio.Seq import Seq
    
    dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
    print dna.reverse_complement()
    

    Output: CTATCGGGCACCCTTTCAGCGGCCCATTACAATGGCCAT