pythonperformancecombinationspermutationmemory-efficient

How to generate all length n combinations of k elements (with replacement) that contain each element at least once?


I have a list of 7 elements, and I'm going to fill another list of length 50 with these elements. I'd like to generate a dataframe where each row represents one possible way of selecting these 7 elements into 50 slots. However, I am only interested in the arrangements that contain each of the 7 elements at least once.

This is the approach I have currently, but would love any suggestions on how to write this more efficiently to save time (it takes a long time to run).

from math import comb
import numpy as np
import pandas as pd
import itertools

elements = [1,2,3,4,5,6,7]

combos = pd.DataFrame(itertools.combinations_with_replacement(range(7),50))

keep_rows = combos.apply(lambda row: np.sum([a in list(row) for a in elements])==7,axis=1)

Is there a faster way to accomplish this?

Edit Also, I only need arrangements in ascending order (hence using itertools.combinations_with_replacement()). For example, [1,2,3,4,5,6,7,...,7] is of interest but [7,1,2,3,4,5,6,...,7] is not of interest.

Edit 2 List had 8 elements by mistake.


Solution

  • The issue with your approach is that you're generating a lot of useless combinations. I didn't do the math for, or computed, your original 50 slots case; but let's consider a simpler case, with 7 elements in 20 slots: you would generate 230230 combos, and then discard most of them and keep only 27132 - and of course when the number of slots increases, so does the number of discarded combos.

    The solution is to generate only the combos you need: in the example above, use combinations_with_replacement(range(7), 20-7), which yields exactly 27132 combos, and append to each of them the list of elements so that each element is guaranteed to appear at least once by definition.

    Re your request to have the elements sorted, combinations_with_replacement() outputs its result in lexicographic order, so it's not an issue. With my solution, however, you will need to sort each combo once you appended the list of elements to it. This will slow things down quite a bit, but it still should be much faster than creating lots of useless results just to dicard them.