The following is what I am trying to achieve:
alpha = 'abcdefghijklmnopqrstuvwxyz0123456789'
items = ['item-a','item-b'...'item-n']
my_map = defaultdict()
for i, item in enumerate(items):
my_map[alpha[i]] = item
Here, I am creating a dictionary where each item is mapped against a single character. Every possible combination (i.e. ab7, so 3 items) can randomly be picked after which its value will be processed.
For n items a total of 2^n combinations exist. So for 6 items, the total of combinations that can be picked looks as follows:
['a','b','ab'....'bef'...'abcdef']
NOTE: combinations only occur once. In this situation 'ba'
is the same as 'ab'
, therefor only 'ab'
exists in the list.
Suppose I have the index of the combination, how can I get the combination that belongs to that index without calculating all possible combinations?
I have played around with the following but this only works for all possible combinations
import math
def get_bijective_val(n, alphabet):
base = len(alphabet)
digits = []
while n:
remainder = math.ceil(n/base)-1
digits.append(n - remainder*base)
n = remainder
digits.reverse()
return "".join(alphabet[digit-1] for digit in digits)
get_bijective_val(50,'abcdef')
(taken from https://stackoverflow.com/a/20446640/1251070)
I have tried to modify it but am unable to find the solution
This is the expected result:
>>> print(get_bijective_val(0, 'abcdef'))
>>> 'a'
>>> print(get_bijective_val(50, 'abcdef'))
>>> 'bef'
>>> print(get_bijective_val(63, 'abcdef'))
>>> 'abcdef'
For larger alphabets this algorithm is much faster than generating all combinations with itertools, but there is probably still a lot of potential for optimizations:
from math import factorial
def get_combination(alphabet, index):
"""Finds the nth combination of any length"""
alphabet = list(alphabet)
n = len(alphabet)
k = 0
# Find length of combination (k)
while k <= n:
combination_count = n_choose_k(n, k)
if index < combination_count:
# index is within combinations of length k
break
else:
# index is within combinations of length > k
index -= combination_count
k += 1
if k > n:
raise Exception("Index out of range")
return get_k_combination(alphabet, int(k), index)
def get_k_combination(alphabet, k, index):
"""Finds the nth combination of length k"""
combination = []
for elem in range(k):
n = len(alphabet) - 1
k_ = k - elem - 1
i = 0
while n - i >= k_:
combination_count = n_choose_k(n - i, k_)
if index < combination_count:
combination.append(alphabet[i])
alphabet = alphabet[i + 1:]
break
else:
index -= combination_count
i += 1
return combination
def get_combination_bruteforce(alphabet, index):
return list(
[
x
for i in range(len(alphabet)+1)
for x in combinations(alphabet, i)
][index]
)
def n_choose_k(n, k):
return factorial(n) // (factorial(n - k) * factorial(k))
Small CLI and benchmarks here
For an alphabet of length 15, get_combination
took 0.228s and get_combination_bruteforce
took 39.525s to find all possible combinations one by one (173 times speedup).
It works by first finding out how long the combination with the given index is and then reconstructing it element-wise:
n over k
combinations of length k
, where n
is the alphabet size. So we find the length by subtracting n over k
for increasing k
from the given index until it goes negative. The last k
is the length of the requested combination.k
starting with the x
-th element of the alphabet, the remaining combination elements have to be chosen from alphabet[x+1:]
. With this information we can reconstruct each element like we calculated k
before.There may be a closed form solution to both of these steps, but I won't be the one to find it.