pythonstringalgorithm

What is a good way to generate all strings of length n over a given alphabet within a range in dictionary order?


I want to write a generator s_generator(alphabet, length, start_s, end_s) that generates strings of length n over a given alphabet in dictionary order starting with start_s and ending at end_s.

For example, s_generator('ab', 4, 'aaaa', 'bbbb') generates ['aaaa', 'aaab', 'aaba', 'aabb', 'abaa', 'abab', 'abba', 'abbb', 'baaa', 'baab', 'baba', 'babb', 'bbaa', 'bbab', 'bbba', 'bbbb']. And s_generator('ab', 4, 'abaa', 'abaa') generates ['abaa', 'abab', 'abba', 'abbb', 'baaa']

What is a good way to implement it?

I thought about assigning a number to each character in alphabet, treating the string as a base-n number (n is size of alphabet) and using addition to get the next number, and then convert the number back to string. For example, 'abab' is [0, 1, 0, 1] and the next number is [0, 1, 1, 0], which is 'abba'. This method seems complicated. Is there a simpler solution?


Solution

  • It seems the counting method I mentioned in my question is the most efficient and cleanest solution.

    def str_generator(start_s, end_s):
        # assuming start_s < end_s and are valid strings
        start_nums, end_nums = str2nums(start_s), str2nums(end_s)
        nums = start_nums
        while nums <= end_nums:
            yield nums2str(nums)
    
            # increment by 1
            i = len(nums) - 1
            while i >= 0:
                nums[i] = nums[i] + 1
                if nums[i] < SIZE_OF_ALPHABET:
                    break
                nums[i] = 0
                i -= 1
            if i < 0:
                return
    

    str2nums(s) and nums2str(nums) are just functions that handles the mapping between strings and numbers. They can be returning hash table entries. Or in my case of using a-zA-Z as my alphabet, simply shifting the ASCII values will do:

    # ASCII value
    # A-Z: 65-90, a-z: 97-122
    
    def str2nums(s):
        nums = []
        for c in s:
            if c.islower():  # 0-25 is a-z
                nums.append(ord(c) - ord('a'))
            else:  # 26-51 is A-Z
                nums.append(ord(c) + 26 - ord('A'))
        return nums
    
    def nums2str(nums):
        str_builder = []
        for n in nums:
            if n < 26:  # 0-25 is a-z
                str_builder.append(chr(n + ord('a')))
            else:  # 26-51 is A-Z
                str_builder.append(chr(n - 26 + ord('A')))
        return ''.join(str_builder)