I am trying to make all possible substitutions between a reference and a test sequence. The sequences will always be the same length and the goal is to substitute Test characters with those of Ref.
Ref= "AAAAAAAAA"
Test="AAATAATTA"
Desired output:
AAATAATTA, AAATAAAAA, AAATAATAA, AAATAATTA, AAAAAATTA, AAAAAATAA, AAAAAAATA
You can use itertools.product
for this if you zip
the two strings together (turning them into a set of 2-tuples for product
to find combinations of). You then probably want to uniquify them in a set. All together it looks like this:
>>> {''.join(t) for t in product(*zip(Ref, Test))}
{'AAAAAAAAA', 'AAAAAATAA', 'AAAAAAATA', 'AAATAATTA', 'AAATAATAA', 'AAATAAAAA', 'AAATAAATA', 'AAAAAATTA'}
To break that down a little further, since it looks a bit like line noise if you aren't familiar with the functions in question...
Here's the zip
that turns our two strings into an iteration of pairs (wrapping it in a list comprehension for easy printing, but we'll remove that in the next stage):
>>> [t for t in zip(Ref, Test)]
[('A', 'A'), ('A', 'A'), ('A', 'A'), ('A', 'T'), ('A', 'A'), ('A', 'A'), ('A', 'T'), ('A', 'T'), ('A', 'A')]
The product
function takes an arbitrary number of iterables as arguments; we want to feed it all of our 2-tuples as separate arguments using *
:
>>> [t for t in product(*zip(Ref, Test))]
[('A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'), ('A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'), ... (a whole lot of tuples)
Use join
to turn those tuples back into strings:
>> [''.join(t) for t in product(*zip(Ref, Test))]
['AAAAAAAAA', 'AAAAAAAAA', 'AAAAAAATA', 'AAAAAAATA', ... (still a whole lot of strings)
And by making this a set comprehension ({}
) instead of a list comprehension ([]
), we get just the unique elements.