pythonpython-3.xstring

Getting list of unique substrings of a given string


The task is to obtain a unique list of substrings in python.

I am currently using the breakup of the problem into 2 parts: obtain a list of all the substrings, followed by obtaining unique substrings.

I am using the below code:

substrings=[]
for i in range(0,len(inputstring)+1):
    for j in range(i+1,len(inputstring)+1):
        substr=inputstring[i:j]
        substrings.append(substr)
uniq=[]
for ss in substrings:
    if ss not in uniq:
        uniq.append(ss)

Is there a faster way to solve this problem or a so-called python way of doing it in a more flexible way?

a simple example string being: "aabaa", possible substrings are [a,a,b,a,a,aa,ab,ba,aa,aab,aba,baa,aaba,abaa,aabaa], unique substring which is desired at the end [a,b,aa,ab,ba,aab,aba,baa,aaba,abaa,aabaa]


Solution

  • Use Itertools and Set. Similar to the answer of Edwin but with Itertools, and in one line.

    import itertools
    
    uniq=list(set([inputstring[x:y] for x, y in itertools.combinations(
                range(len(inputstring) + 1), r = 2)]))
    

    Basically you use itertools to first find all combinations, then set to find unique elements, then cast to list.

    Code for combinations taken from https://www.geeksforgeeks.org/python-get-all-substrings-of-given-string/

    Edit for a clearer explanation: First, use combinations to get all pair of indexes corresponding to substrings. The trick here is that itertools.combinations starts with all (0,X) pairs, and then (1,X) pairs, etc. Since we are using combinations and not permutations, we eliminate thus reverse substrings such as (1,0) since they will have been seen in the (0,X) enumerations.

    Then simply use these with a list comprehension to get all substrings, use a set to find unique elements, and cast to a list.

    Hope that helps