pythonlistbioinformaticsbiopythondna-sequence

Reverse complement of DNA strand using Python


I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the same file. The tricky part is, there are a few cells with something other than A, T, G and C. I was able to get reverse complement with this piece of code:

def complement(seq):
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 
    bases = list(seq) 
    bases = [complement[base] for base in bases] 
    return ''.join(bases)
    def reverse_complement(s):
        return complement(s[::-1])

    print "Reverse Complement:"
    print(reverse_complement("TCGGGCCC"))

However, when I try to find the item which is not present in the complement dictionary, using the code below, I just get the complement of the last base. It doesn't iterate. I'd like to know how I can fix it.

def complement(seq):
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 
    bases = list(seq) 
    for element in bases:
        if element not in complement:
            print element  
        letters = [complement[base] for base in element] 
        return ''.join(letters)
def reverse_complement(seq):
    return complement(seq[::-1])

print "Reverse Complement:"
print(reverse_complement("TCGGGCCCCX"))

Solution

  • The get method of a dictionary allows you to specify a default value if the key is not in the dictionary. As a preconditioning step I would map all your non 'ATGC' bases to single letters (or punctuation or numbers or anything that wont show up in your sequence), then reverse the sequence, then replace the single letter alternates with their originals. Alternatively, you could reverse it first and then search and replace things like sni with ins.

    alt_map = {'ins':'0'}
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 
    
    def reverse_complement(seq):    
        for k,v in alt_map.iteritems():
            seq = seq.replace(k,v)
        bases = list(seq) 
        bases = reversed([complement.get(base,base) for base in bases])
        bases = ''.join(bases)
        for k,v in alt_map.iteritems():
            bases = bases.replace(v,k)
        return bases
    
    >>> seq = "TCGGinsGCCC"
    >>> print "Reverse Complement:"
    >>> print(reverse_complement(seq))
    GGGCinsCCGA