I'm new in Python and I'm trying to obtain the Hamming distance between a pair of DNA sequences. Although I was able to do this, I don't really know how to obtain a list of Hamming distances of more than one pair of DNA sequences. I wonder if anyone could please guide me on this.
dna1 = 'ACCTAT'
dna2 = 'CATTGA'
def distance(strand_a, strand_b):
if len(strand_a) == len(strand_b):
i = 0
n = 0
while i < len(strand_a):
if strand_a[i] != strand_b[i]:
i += 1
n += 1
else:
i += 1
return(n)
else:
raise ValueError("The strings are not the same length")
Output:
The distance is: 5
I wonder if anyone could please help me know which could be the best way to obtain a list of Hamming distances between three pairs of DNA sequences (I tried to do this myself by changing the code above, but I haven't been able to find the solution).
Given these two lists, I want to get the Hamming distance between the 1st, 2nd and 3rd pairs of DNA sequences:
dna1 = ['ACTGG','ATGCA','AACTG']
dna2 = ['ACTGA','ATGGG','ATGAC']
Where the output would be:
distances = [1, 2, 4]
Thank you all for your help!
You can try:
import numpy as np
dna1 = ['ACTGG','ATGCA','AACTG']
dna2 = ['ACTGA','ATGGG','ATGAC']
[(np.array(list(x)) != np.array(list(y))).sum() for x, y in zip(dna1, dna2)]
It gives:
[1, 2, 4]