pythonloopsiteratordna-sequence

Iterating and matching pairs in a list [DNA sequences] to append the values


Hello,

I'm trying to create a for loop to read a list of DNA sequences and get the value for all the pairs. The idea is to read the current and the next item to math it with a specific value for that pair and then append it to a final list.

This is an example:
AA= 5
AT=6
AC=13
AG=8
CA= 6
TG= 12
...[etc.]
DNA_seq= [A,A,C,A,T,G]
These 5 pairs (AA,AC,CA,AT,TG) should give me a value of 42

So, this is what I'm trying; I first define a method to get the next item:
(I know there is a built-in next function, but it wasn't working either)


    def nextbase():
        next_base= next(base)
        return next_base

And then:


    AA=5
    AT=4
    AC=3
    AG=2
    TA=5
    TT=4
    TC=3
    TG=2
    CA=5
    CT=4
    CC=3
    CG=2
    GA=5
    GT=4
    GC=3
    GG=2
    
    stacking= []
    for strand in dsDNA:
        for b in strand:
            base= iter(b)
            if base =='A':
                if nextbase() == 'A':
                    append.stacking(AA)
                elif nextbase() == 'T':
                    append.stacking(AT)
                elif nextbase() == 'C':
                    append.stacking(AC)
                elif nextbase() == 'G':
                    append.stacking(AG)
            elif base=='G':
                if nextbase() == 'A':
                    append.stacking(GA)
                elif nextbase() == 'T':
                    append.stacking(GT)
                elif nextbase() == 'C':
                    append.stacking(GC)
                elif nextbase() == 'G':
                    append.stacking(GG)
            elif base=='c':
                if nextbase() == 'A':
                    append.stacking(CA)
                elif nextbase() == 'T':
                    append.stacking(CT)
                elif nextbase() == 'C':
                    print('yes')
                    append.stacking(CC)
                elif nextbase() == 'G':
                    append.stacking(CG)
            elif base=='T':
                if nextbase() == 'A':
                    append.stacking(TA)
                elif nextbase() == 'T':
                    append.stacking(TT)
                elif nextbase() == 'C':
                    append.stacking(TC)
                elif nextbase() == 'G':
                    append.stacking(TG)
            else:
                print('eror') 
    print(stacking)

    

But is just not working it will just print error cause it's not recognising anything, does anyone know if there is any efficient way to do this? Thanks!!


Solution

  • This is not too hard to do: first create a dictionary with the 'weight' of each pair. Then loop over the dna-sequence and sum up the values retrieved from that dictionary:

    dict={'AA':5,
          'AT':4,
          'AC':3,
          'AG':2,
          'TA':5,
          'TT':4,
          'TC':3,
          'TG':2,
          'CA':5,
          'CT':4,
          'CC':3,
          'CG':2,
          'GA':5,
          'GT':4,
          'GC':3,
          'GG':2 }
    
    DNA_seq= ['A','A','C','A','T','G']
    total = sum([dict[DNA_seq[i]+DNA_seq[i+1]] for i in range(len(DNA_seq)-1) ])
    
    print(total)
    >>> 19