pythoncombinationspermutationcombinatoricsbijection

How to get combination by index from a list of unique combinations


The following is what I am trying to achieve:

alpha = 'abcdefghijklmnopqrstuvwxyz0123456789'
items = ['item-a','item-b'...'item-n']
my_map = defaultdict()

for i, item in enumerate(items):
        my_map[alpha[i]] = item

Here, I am creating a dictionary where each item is mapped against a single character. Every possible combination (i.e. ab7, so 3 items) can randomly be picked after which its value will be processed.

For n items a total of 2^n combinations exist. So for 6 items, the total of combinations that can be picked looks as follows:

['a','b','ab'....'bef'...'abcdef'] 

NOTE: combinations only occur once. In this situation 'ba' is the same as 'ab', therefor only 'ab' exists in the list.

Suppose I have the index of the combination, how can I get the combination that belongs to that index without calculating all possible combinations?

I have played around with the following but this only works for all possible combinations

import math

def get_bijective_val(n, alphabet):
    base = len(alphabet)
    digits = []
    while n:
        remainder = math.ceil(n/base)-1
        digits.append(n - remainder*base)
        n = remainder
    digits.reverse()
    return "".join(alphabet[digit-1] for digit in digits)

get_bijective_val(50,'abcdef')

(taken from https://stackoverflow.com/a/20446640/1251070)

I have tried to modify it but am unable to find the solution

This is the expected result:

>>> print(get_bijective_val(0, 'abcdef'))
>>> 'a'
>>> print(get_bijective_val(50, 'abcdef'))
>>> 'bef'
>>> print(get_bijective_val(63, 'abcdef'))
>>> 'abcdef'

Solution

  • For larger alphabets this algorithm is much faster than generating all combinations with itertools, but there is probably still a lot of potential for optimizations:

    from math import factorial
    
    
    def get_combination(alphabet, index):
        """Finds the nth combination of any length"""
        alphabet = list(alphabet)
        n = len(alphabet)
        k = 0
    
        # Find length of combination (k)
        while k <= n:
            combination_count = n_choose_k(n, k)
            if index < combination_count:
                # index is within combinations of length k
                break
            else:
                # index is within combinations of length > k
                index -= combination_count
                k += 1
        if k > n:
            raise Exception("Index out of range")
    
        return get_k_combination(alphabet, int(k), index)
    
    
    def get_k_combination(alphabet, k, index):
        """Finds the nth combination of length k"""
        combination = []
        for elem in range(k):
            n = len(alphabet) - 1
            k_ = k - elem - 1
            i = 0
            while n - i >= k_:
                combination_count = n_choose_k(n - i, k_)
                if index < combination_count:
                    combination.append(alphabet[i])
                    alphabet = alphabet[i + 1:]
                    break
                else:
                    index -= combination_count
                    i += 1
        return combination
    
    
    def get_combination_bruteforce(alphabet, index):
        return list(
            [
                x
                for i in range(len(alphabet)+1)
                for x in combinations(alphabet, i)
            ][index]
        )
    
    
    def n_choose_k(n, k):
        return factorial(n) // (factorial(n - k) * factorial(k))
    

    Small CLI and benchmarks here

    For an alphabet of length 15, get_combination took 0.228s and get_combination_bruteforce took 39.525s to find all possible combinations one by one (173 times speedup).

    It works by first finding out how long the combination with the given index is and then reconstructing it element-wise:

    1. The combinations are primarily sorted by length and we know that there are n over k combinations of length k, where n is the alphabet size. So we find the length by subtracting n over k for increasing k from the given index until it goes negative. The last k is the length of the requested combination.
    2. Since the combinations are secondarily sorted by the alphabet order, we know that each element in a combination is only followed by higher elements. So for a combination of length k starting with the x-th element of the alphabet, the remaining combination elements have to be chosen from alphabet[x+1:]. With this information we can reconstruct each element like we calculated k before.

    There may be a closed form solution to both of these steps, but I won't be the one to find it.