pythonlevenshtein-distancecharacter-replacement

Is there a Python function to make all possible substititutions between two equal length strings?


I am trying to make all possible substitutions between a reference and a test sequence. The sequences will always be the same length and the goal is to substitute Test characters with those of Ref.

Ref= "AAAAAAAAA"
Test="AAATAATTA"

Desired output:

AAATAATTA, AAATAAAAA,  AAATAATAA,  AAATAATTA,  AAAAAATTA,  AAAAAATAA,  AAAAAAATA

Solution

  • You can use itertools.product for this if you zip the two strings together (turning them into a set of 2-tuples for product to find combinations of). You then probably want to uniquify them in a set. All together it looks like this:

    >>> {''.join(t) for t in product(*zip(Ref, Test))}
    {'AAAAAAAAA', 'AAAAAATAA', 'AAAAAAATA', 'AAATAATTA', 'AAATAATAA', 'AAATAAAAA', 'AAATAAATA', 'AAAAAATTA'}
    

    To break that down a little further, since it looks a bit like line noise if you aren't familiar with the functions in question...

    Here's the zip that turns our two strings into an iteration of pairs (wrapping it in a list comprehension for easy printing, but we'll remove that in the next stage):

    >>> [t for t in zip(Ref, Test)]
    [('A', 'A'), ('A', 'A'), ('A', 'A'), ('A', 'T'), ('A', 'A'), ('A', 'A'), ('A', 'T'), ('A', 'T'), ('A', 'A')]
    

    The product function takes an arbitrary number of iterables as arguments; we want to feed it all of our 2-tuples as separate arguments using *:

    >>> [t for t in product(*zip(Ref, Test))]
    [('A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'), ('A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'), ... (a whole lot of tuples)
    

    Use join to turn those tuples back into strings:

    >> [''.join(t) for t in product(*zip(Ref, Test))]
    ['AAAAAAAAA', 'AAAAAAAAA', 'AAAAAAATA', 'AAAAAAATA', ... (still a whole lot of strings)
    

    And by making this a set comprehension ({}) instead of a list comprehension ([]), we get just the unique elements.