pythonstringcombinationsrosalind

Rosalind's - 'Ordering Strings of Varying Length Lexicographically' Is it possible to sort my result?


I tried to solve another of Rosalind(http://rosalind.info/problems/lexv/) problems by myself, however unfortunately I have to ask you for help.

Here is my approach:

First of all; function which creates all possible substrings of input string with length of n:

def get_substrings(input_string, l):
res_list = []
sub = []
for i in range(len(input_string)):
    if l+i <= len(input_string):
        for j in range(i,l+i):
            sub.append(input_string[j])
    sub = ''.join(sub)
    res_list.append(sub)
    sub = []
res_list = filter(None, res_list)
return res_list

Then main function which creates all combinations of current string with varying length:

from itertools import product
def lexv():
dna = str(raw_input())
n = int(raw_input())
subs = get_substrings(dna, n)
result = []
for i in range(len(subs)):
    for j in range(1,n+1):
        result = result + list(product(dna, repeat=j))
for i in range(len(result)):
     result[i]  = "".join(result[i])
     print result[i]

The result of my code for data from Rosalind's 'Sample input' is:

D
N
A
DD
DN
DA
ND
NN
NA
AD
AN
AA
DDD
DDN
DDA
DND
DNN
DNA
DAD
DAN
DAA
NDD
NDN
NDA
NND
NNN
NNA
NAD
NAN
NAA
ADD
ADN
ADA
AND
ANN
ANA
AAD
AAN
AAA

My questions:

a) Is it possible to order my result as it should be (Rosalind's result)?

b) Is my approach correct? If not, could you give me some clue (but not the solution for this problem - I'd like to beat it by myself).

Very thanks!


Solution

  • Generating possible substrings

    For the generation of possible substrings, I would look deeper into the itertools module. With chain and product you can easily make a one-lines that returns all combinations

    Sorting

    for the sorting, I would replace the letters themselves with ints, keeping this translation table in a dict. enumerate and str.split are your friends here.

    Now you have a list of tuples of ints, which you can sort. From what I can tell, the standard sorting order of tuples is useful.

    When you have a sorted list of tuples, you will just need to do the backwards translation to strings

    sorted(key=)

    Another option is to use standard string sorting, and as key= argument to sorted, passing a function (or lambda) which replaces each letter in the string to it's place in the alphabet (the input_string)

    This can be done with str.translate, enumerate and a dict comprehension