algorithmcombinations

How to generate combinations in chunks


I have algorithm which performs calculations for every combination of input elements, e.g. for every 5-element subset out of 100-element set. I am porting it to GPU, and now I have initial version of it ready. In order to speed it up I would like to load data from local memory, which is limited (e.g. 32KB) and can hold something like 20 input elements out of 100. So I have to somehow partition my work and generate combinations in chunks. And now this is the hard part, how to do this. Most probably I have to first load data for 20 elements and perform calculations for 5-elements subsets of these 20 elements. After this I would have to replace some (or all) of them with new ones and perform calculations for them, then rinse and repeat. Could you tell me how should I choose replaced elements in local memory, to avoid duplicate work? So far I came to conclusion that I would have to replace at least 16 of them at once to avoid duplicated work problem.

Edit: here is example for generating of 2-element combinations out of 5 elements. Here is full list of all possible cases:

1, 2
1, 3
1, 4
1, 5
2, 3
2, 4
2, 5
3, 4
3, 5
4, 5

Local memory on GPU is limited - lets assume it can hold 3 elements only. So I have to somehow divide my problem into generating combinations of 2 elements out of 3. I have to repeat this multiple times, until I get all combinations from above list. As a first step I can load elements 1, 2, 3 into local memory, so I will get following combinations:

1, 2
1, 3
2, 3

Now I have to load another set of elements and calculate combinations for them. It can be 1, 4, 5. It will produce following combinations:

1, 4
1, 5
4, 5

On the other hand set 1, 2, 4 is invalid - it would result in duplicated combination:

1, 2 // duplicate
1, 4 // ok, new
2, 4 // ok, new

After this step there are 4 more combinations to be generated (list is below). Algorithm has to be able to generate another 2-element combination out of 3 elements, and to somehow handle last (10th) combination.

2, 4
2, 5
3, 4
3, 5

By splitting work in this way, I would be able to process all combinations of original input set using limited local memory which can hold only part of it.


Solution

  • Say e.g. you have 100 elements, you can hold 20 in memory, and you need all the 5-element combinations, of which there are C(100,5) = 75,287,520.

    In order to be able to generate all the combinations, every combination of 5 elements has to be in memory at some point. This can be done by dividing the elements into groups of 20/5 = 4 elements; there are 25 of these groups in the input, and C(25,5) = 53,130 combinations of 5 groups.

    For every combination of groups, we will start by generating the combinations with one element from each of the five groups; this gives us 53,130 x 45 = 54,405,120 unique combinations.

    We now have the combinations where each element comes from a different group, which is the partition [1,1,1,1,1]. We still have to find the combinations for the partitions [2,1,1,1], [2,2,1], [3,1,1], [3,2] and [4,1]. The easiest way will be to do this in seperate stages, but the fastest way will of course be to incorporate these into the first stage for [1,1,1,1,1], because all the combinations of groups we'll ever need are loaded in memory at some point during the first stage.

    For the partition [2,1,1,1], we'd load every group in turn as the group with 2 elements, and then load every combination of 3 groups from the remaining 24 groups and take an element from each of them. That would require 25 x C(24,3) = 50,600 steps, each resulting in C(4,2) x 43 = 384 combinations, or a total of 19,430,400.

    A partition like [2,2,1] is a bit different, because we'd load every group in turn to be the first group with 2 elements, but only the groups that come after it as the second group with 2 elements, to avoid duplicates. Then for each of these, we'd load each of the other 23 groups to get the final element. That would require C(25,2)/2 x 23 = 6,900 steps, each resulting in C(4,2) x C(4,2) x C(4,1) = 144 combinations, for a total of 993,600.

    The partition [3,1,1] requires 25 x C(24,2) = 25 x 276 = 6,900 steps, each resulting in C(4,3) x 42 = 64 combinations, for a total of 441,600.

    The partition [3,2] requires 25 x 24 = 600 steps, each resulting in C(4,3) x C(4,2) = 24 combinations, for a total of 14,400.

    The partition [4,1] requires 25 x 24 = 600 steps, each resulting in C(4,4) x C(4,1) = 4 combinations, for a total of 2,400.

    So we have a total of:

    [1,1,1,1,1] -> 54,405,120
    [2,1,1,1]   -> 19,430,400
    [2,2,1]     ->    993,600
    [3,1,1]     ->    441,600
    [3,2]       ->     14,400
    [4,1]       ->      2,400
                   ----------
                   75,287,520 combinations
    

    As you'll have noticed, the partitions [3,2] and [4,1] both require every combination of two groups, so they can be easily integrated into one stage. Of course, if you integrate them all into the first stage for [1,1,1,1,1], you'll only have to load 53,130 combinations of groups into memory, which is the absolute minimum.

    (If it's faster to load only one new group of elements into memory at each step, instead of running through the combinations of groups in lexicographical order, check out this answer.)


    Integration of different stages

    The simplest way to run through all the group combinations for the partition [1,1,1,1,1] is to load groups 1 to 21 as group A, then all groups after A up to 22 as group B, all groups after B up to 23 as group C, all groups after C up to 24 as group D, and all groups after D up to 25 as group E.

     A  B  C  D  E
     1  2  3  4  5    <- ABCDE
     1  2  3  4  6    <- ABCDE
    ...
     1  2  3  4 25    <- ABCDE
     1  2  3  5  6    <- ABCDE
    ...
     1  2  3 24 25    <- ABCDE
     1  2  4  5  6    <- ABCDE
    ...
     1  2 23 24 25    <- ABCDE
     1  3  4  5  6    <- ABCDE
    ...
     1 22 23 24 25    <- ABCDE
     2  3  4  5  6    <- ABCDE
    ...
    21 22 23 24 25    <- ABCDE
    

    The partitions with four parts can be integrated by taking [2,1,1,1], [1,2,1,1], [1,1,2,1] and [1,1,1,2] elements from these combinations of four groups:

     A  B  C  D  E
     1  2  3  4  5    <- ABCD ABCE ABDE ACDE BCDE
     1  2  3  4  6    <- ABCE ABDE ACDE BCDE
     ...        
     1  2  3  4 25    <- ABCE ABDE ACDE BCDE
     1  2  3  5  6    <- ABDE ACDE BCDE
     ...        
     1  2  3 24 25    <- ABDE ACDE BCDE
     1  2  4  5  6    <- ACDE BCDE
    ...
     1  2 23 24 25    <- ACDE BCDE
     1  3  4  5  6    <- BCDE
    ...
     1 22 23 24 25    <- BCDE
     2  3  4  5  6    <- none
    ...
    21 22 23 24 25    <- none
    

    The partitions with three parts can be integrated by taking [2,2,1], [2,1,2], [1,2,2], [3,1,1], [1,3,1] and [1,1,3] elements from these combinations of three groups:

     A  B  C  D  E
     1  2  3  4  5    <- ABC ABD ACD BCD ABE ACE BCE ADE BDE CDE
     1  2  3  4  6    <- ABE ACE BCE ADE BDE CDE
    ...
     1  2  3  4 25    <- ABE ACE BCE ADE BDE CDE
     1  2  3  5  6    <- ADE BDE CDE
    ...     
     1  2  3 24 25    <- ADE BDE CDE
     1  2  4  5  6    <- CDE
    ...
     1  2 23 24 25    <- CDE
     1  3  4  5  6    <- none
    ...
    21 22 23 24 25    <- none
    

    The partitions with two parts can be integrated by taking [2,3], [3,2], [4,1] and [1,4] elements from these combinations of two groups:

     A  B  C  D  E
     1  2  3  4  5    <- AB AC BC AD BD CD AE BE CE DE
     1  2  3  4  6    <- AE BE CE DE
    ...
     1  2  3  4 25    <- AE BE CE DE
     1  2  3  5  6    <- DE
    ...
     1  2  3 24 25    <- DE
     1  2  4  5  6    <- none
    ...
    21 22 23 24 25    <- none
    

    In general

    There are e elements, m elements can be loaded in memory, and you want all combinations of k elements. Use k groups of size g = m/k.

    Generate all partitions of k with parts limited to size g:

    [1,1,1 ... 1] [2,1,1 ... 1] [2,2,1 ... 1] ... [k]        (if k <= g)
    [1,1,1 ... 1] [2,1,1 ... 1] [2,2,1 ... 1] ... [g,k-g]    (if k > g)
    

    For each of these, generate all unique permutations, e.g.:

    [3,2,2,1] -> [3,2,2,1] [3,2,1,2] [3,1,2,2]
                 [2,3,2,1] [2,3,1,2] [1,3,2,2]
                 [2,2,3,1] [2,1,3,2] [1,2,3,2]
                 [2,2,1,3] [2,1,2,3] [1,2,2,3]
    

    Sort the permutations per number of parts, e.g.:

    k:   [1,1,1 ... 1]
    k-1: [2,1 ... 1] [1,2 ... 1] ... [1,1 ... 2]
    ...
    2:   [g,k-g] [k-g,g]
    

    Load the first k groups into memory, e.g.:

    A  B  C  D  E  F
    1  2  3  4  5  6
    

    For every length of partition p, generate every set of groups of size p, e.g.:

    p=k:   ABCDEF                                 C(k,k)   sets
    p=k-1: ABCDE ABCDF ABCEF ABDEF ACDEF BCDEF    C(k,k-1) sets
    p=k-2: ABCD ABCE ABCF ABDE ABDF ... CDEF      C(k,k-2) sets
    ...
    p=2:   AB AC AD AE AF BC BD BE BF ... EF      C(k,2)   sets
    

    For each of these sets, generate the combinations for the partitions with the corresponding number of parts, e.g.:

    p=k-1: ABCDE [2,1,1,1,1] -> [a,a,b,c,d,e]    C(g,2)*C(g,1)^4 combinations
                 [1,2,1,1,1] -> [a,b,b,c,d,e]
                 [1,1,2,1,1] -> [a,b,c,c,d,e]
                 [1,1,1,2,1] -> [a,b,c,d,d,e]
                 [1,1,1,1,2] -> [a,b,c,d,e,e]
           ABCDE [2,1,1,1,1] -> [a,a,b,c,d,f]
                 [1,2,1,1,1] -> [a,b,b,c,d,f]
                 [1,1,2,1,1] -> [a,b,c,c,d,f]
                 [1,1,1,2,1] -> [a,b,c,d,d,f]
                 [1,1,1,1,2] -> [a,b,c,d,f,f]
           ...
           BCDEF [2,1,1,1,1] -> [b,b,c,d,e,f]
                 [1,2,1,1,1] -> [b,c,c,d,e,f]
                 [1,1,2,1,1] -> [b,c,d,d,e,f]
                 [1,1,1,2,1] -> [b,c,d,e,e,f]
                 [1,1,1,1,2] -> [b,c,d,e,f,f]
    

    From the list of sets, remove the sets which do not contain the last group (F):

    p=k:   ABCDEF
    p=k-1: ABCDF ABCEF ABDEF ACDEF BCDEF
    p=k-2: ABCF ABDF ABEF ACDF ACEF ADEF BCDF BCEF BDEF CDEF
    ...
    p=2:   AF BF CF DF EF
    

    Load the next groups up to e/g into memory as group F, e.g.:

    A  B  C  D  E  F
    1  2  3  4  5  7
    ...
    1  2  3  4  5 e/g
    

    Again, for each of these, and for each of the sets, generate the combinations for the partitions with the corresponding number of parts.

    From the list of sets, remove the sets which do not contain the two last groups (EF):

    p=k:   ABCDEF
    p=k-1: ABCEF ABDEF ACDEF BCDEF
    p=k-2: ABEF ACEF ADEF BCEF BDEF CDEF
    ...
    p=2:   EF
    

    Load the next groups up to e/g-1 into memory as group E, and for each of these, load the groups after E up to e/g into memory as group F, e.g.:

    A    B    C    D    E    F
    1    2    3    4    6    7
    ...
    1    2    3    4  e/g-1 e/g
    

    Again, for each of these, and for each of the sets, generate the combinations for the partitions with the corresponding number of parts.

    From the list of sets, remove the sets which do not contain the three last groups (DEF):

    p=k:   ABCDEF
    p=k-1: ABDEF ACDEF BCDEF
    p=k-2: ADEF BDEF CDEF
    ...
    p=2:   none
    

    Load the next groups up to e/g-2 into memory as group D, and for each of these, load the groups after D up to e/g-1 into memory as group E, and for each of these, load the groups after E up to e/g into memory as group F, e.g.:

    A     B     C     D     E     F
    1     2     3     5     6     7
    ...
    1     2     3   e/g-2 e/g-1  e/g
    

    Again, for each of these, and for each of the sets, generate the combinations for the partitions with the corresponding number of parts.

    And so on, until you reach:

      A     B     C     D     E     F
    e/g-5 e/g-4 e/g-3 e/g-2 e/g-1  e/g
    

    with only:

    p=k:   ABCDEF
    

    Run-through for 21,9,3

    number of elements: e = 21
    elements: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]
    size of combinations: k = 3 elements
    size of memory: m = 9 elements

    Preparations:

    number of groups in memory: 3 (k)  
    group size: g = m/k = 3 elements  
    number of groups: e/g = 7  
    groups: 1:[1,2,3] 2:[4,5,6] 3:[7,8,9] 4:[10,11,12] 5:[13,14,15] 6:[16,17,18] 7:[19,20,21]  
    number of element sets loaded into memory: C(e/g,k) = C(7,3) = 35  
    partitions of k with max part g: [1,1,1] [2,1] [3]  
    permutations: 3:{[1,1,1]} 2:{[1,2],[2,1]} 1:{[3]}  
    group sets: 3:{[A,B,C]} 2:{[A,B],[A,C],[B,C]} 1:{[A],[B],[C]}  
    

    Phase 1:

    group sets: 3:{[A,B,C]} 2:{[A,B],[A,C],[B,C]} 1:{[A],[B],[C]}  (all)
    
    A B C
    1 2 3 -> elements in memory: [1,2,3] [4,5,6] [7,8,9] -> 84 combinations
    
    3: [1,1,1]:[A,B,C] -> [a,b,c] -> [1,4,7] [1,4,8] [1,4,9] [1,5,7] [1,5,8] [1,5,9] [1,6,7] [1,6,8] [1,6,9]
                                     [2,4,7] [2,4,8] [2,4,9] [2,5,7] [2,5,8] [2,5,9] [2,6,7] [2,6,8] [2,6,9]
                                     [3,4,7] [3,4,8] [3,4,9] [3,5,7] [3,5,8] [3,5,9] [3,6,7] [3,6,8] [3,6,9]
    
    2: [1,2]:[A,B] -> [a,b,b] -> [1,4,5] [1,4,6] [1,5,6] [2,4,5] [2,4,6] [2,5,6] [3,4,5] [3,4,6] [3,5,6]
       [1,2]:[A,C] -> [a,c,c] -> [1,7,8] [1,7,9] [1,8,9] [2,7,8] [2,7,9] [2,8,9] [3,7,8] [3,7,9] [3,8,9]
       [1,2]:[B,C] -> [b,c,c] -> [4,7,8] [4,7,9] [4,8,9] [5,7,8] [5,7,9] [5,8,9] [6,7,8] [6,7,9] [6,8,9]
       [2,1]:[A,B] -> [a,a,b] -> [1,2,4] [1,3,4] [2,3,4] [1,2,5] [1,3,5] [2,3,5] [1,2,6] [1,3,6] [2,3,6]
       [2,1]:[A,C] -> [a,a,c] -> [1,2,7] [1,3,7] [2,3,7] [1,2,8] [1,3,8] [2,3,8] [1,2,9] [1,3,9] [2,3,9]
       [2,1]:[B,C] -> [b,b,c] -> [4,5,7] [4,6,7] [5,6,7] [4,5,8] [4,6,8] [5,6,8] [4,5,9] [4,6,9] [5,6,9]
    
    1: [3]:[A] -> [a,a,a] -> [1,2,3]
       [3]:[B] -> [b,b,b] -> [4,5,6]
       [3]:[C] -> [c,c,c] -> [7,8,9]
    

    Phase 2:

    group sets: 3:{[A,B,C]} 2:{[A,C],[B,C]} 1:{[C]}  (sets without C removed)
    
    A B C
    1 2 4 -> elements in memory: [1,2,3] [4,5,6] [10,11,12] -> 64 combinations
    
    3: [1,1,1]:[A,B,C] -> [a,b,c] -> [1,4,10] [1,4,11] [1,4,12] [1,5,10] [1,5,11] [1,5,12] [1,6,10] [1,6,11] [1,6,12]
                                     [2,4,10] [2,4,11] [2,4,12] [2,5,10] [2,5,11] [2,5,12] [2,6,10] [2,6,11] [2,6,12]
                                     [3,4,10] [3,4,11] [3,4,12] [3,5,10] [3,5,11] [3,5,12] [3,6,10] [3,6,11] [3,6,12]
    
    2: [1,2]:[A,C] -> [a,c,c] -> [1,10,11] [1,10,12] [1,11,12] [2,10,11] [2,10,12] [2,11,12] [3,10,11] [3,10,12] [3,11,12]
       [1,2]:[B,C] -> [b,c,c] -> [4,10,11] [4,10,12] [4,11,12] [5,10,11] [5,10,12] [5,11,12] [6,10,11] [6,10,12] [6,11,12]
       [2,1]:[A,C] -> [a,a,c] -> [1,2,10] [1,3,10] [2,3,10] [1,2,11] [1,3,11] [2,3,11] [1,2,12] [1,3,12] [2,3,12]
       [2,1]:[B,C] -> [b,b,c] -> [4,5,10] [4,6,10] [5,6,10] [4,5,11] [4,6,11] [5,6,11] [4,5,12] [4,6,12] [5,6,12]
    
    1: [3]:[C] -> [c,c,c] -> [10,11,12]
    
    A B C
    1 2 5 -> elements in memory: [1,2,3] [4,5,6] [13,14,15] -> 64 combinations
    1 2 6 -> elements in memory: [1,2,3] [4,5,6] [16,17,18] -> 64 combinations
    1 2 7 -> elements in memory: [1,2,3] [4,5,6] [19,20,21] -> 64 combinations
    

    Phase 3:

    group sets: 3:{[A,B,C]} 2:{[B,C]}  (sets without B removed)
    
    A B C
    1 3 4 -> elements in memory: [1,2,3] [7,8,9] [10,11,12] -> 45 combinations
    
    3: [1,1,1]:[A,B,C] -> [a,b,c] -> [1,7,10] [1,7,11] [1,7,12] [1,8,10] [1,8,11] [1,8,12] [1,9,10] [1,9,11] [1,9,12]
                                     [2,7,10] [2,7,11] [2,7,12] [2,8,10] [2,8,11] [2,8,12] [2,9,10] [2,9,11] [2,9,12]
                                     [3,7,10] [3,7,11] [3,7,12] [3,8,10] [3,8,11] [3,8,12] [3,9,10] [3,9,11] [3,9,12]
    
    2: [1,2]:[B,C] -> [b,c,c] -> [7,10,11] [7,10,12] [7,11,12] [8,10,11] [8,10,12] [8,11,12] [9,10,11] [9,10,12] [9,11,12]
       [2,1]:[B,C] -> [b,b,c] -> [7,8,10] [7,9,10] [8,9,10] [7,8,11] [7,9,11] [8,9,11] [7,8,12] [7,9,12] [8,9,12]
    
    A B C
    1 3 5 -> elements in memory: [1,2,3] [7,8,9] [13,14,15] -> 45 combinations
    1 3 6 -> elements in memory: [1,2,3] [7,8,9] [16,17,18] -> 45 combinations
    1 3 7 -> elements in memory: [1,2,3] [7,8,9] [19,20,21] -> 45 combinations
    1 4 5 -> elements in memory: [1,2,3] [7,8,9] [13,14,15] -> 45 combinations
    1 4 6 -> elements in memory: [1,2,3] [7,8,9] [16,17,18] -> 45 combinations
    1 4 7 -> elements in memory: [1,2,3] [7,8,9] [19,20,21] -> 45 combinations
    1 5 6 -> elements in memory: [1,2,3] [7,8,9] [16,17,18] -> 45 combinations
    1 5 7 -> elements in memory: [1,2,3] [7,8,9] [19,20,21] -> 45 combinations
    1 6 7 -> elements in memory: [1,2,3] [7,8,9] [19,20,21] -> 45 combinations
    

    Phase 4:

    group sets: 3:{[A,B,C]}  (sets without A removed)
    
    A B C
    2 3 4 -> elements in memory: [4,5,6] [7,8,9] [10,11,12]       -> 27 combinations
    
    3: [1,1,1]:[A,B,C] -> [a,b,c] -> [4,7,10] [4,7,11] [4,7,12] [4,8,10] [4,8,11] [4,8,12] [4,9,10] [4,9,11] [4,9,12]
                                     [5,7,10] [5,7,11] [5,7,12] [5,8,10] [5,8,11] [5,8,12] [5,9,10] [5,9,11] [5,9,12]
                                     [6,7,10] [6,7,11] [6,7,12] [6,8,10] [6,8,11] [6,8,12] [6,9,10] [6,9,11] [6,9,12]
    
    A B C
    2 3 5 -> elements in memory: [4,5,6] [7,8,9] [13,14,15]       -> 27 combinations
    2 3 6 -> elements in memory: [4,5,6] [7,8,9] [16,17,18]       -> 27 combinations
    2 3 7 -> elements in memory: [4,5,6] [7,8,9] [19,20,21]       -> 27 combinations
    2 4 5 -> elements in memory: [4,5,6] [10,11,12] [13,14,15]    -> 27 combinations
    2 4 6 -> elements in memory: [4,5,6] [10,11,12] [16,17,18]    -> 27 combinations
    2 4 7 -> elements in memory: [4,5,6] [10,11,12] [19,20,21]    -> 27 combinations
    2 5 6 -> elements in memory: [4,5,6] [13,14,15] [16,17,18]    -> 27 combinations
    2 5 7 -> elements in memory: [4,5,6] [13,14,15] [19,20,21]    -> 27 combinations
    2 6 7 -> elements in memory: [4,5,6] [16,17,18] [19,20,21]    -> 27 combinations
    3 4 5 -> elements in memory: [7,8,9] [10,11,12] [13,14,15]    -> 27 combinations
    3 4 6 -> elements in memory: [7,8,9] [10,11,12] [16,17,18]    -> 27 combinations
    3 4 7 -> elements in memory: [7,8,9] [10,11,12] [19,20,21]    -> 27 combinations
    3 5 6 -> elements in memory: [7,8,9] [13,14,15] [16,17,18]    -> 27 combinations
    3 5 7 -> elements in memory: [7,8,9] [13,14,15] [19,20,21]    -> 27 combinations
    3 6 7 -> elements in memory: [7,8,9] [16,17,18] [19,20,21]    -> 27 combinations
    4 5 6 -> elements in memory: [10,11,12] [13,14,15] [16,17,18] -> 27 combinations
    4 5 7 -> elements in memory: [10,11,12] [13,14,15] [19,20,21] -> 27 combinations
    4 6 7 -> elements in memory: [10,11,12] [16,17,18] [19,20,21] -> 27 combinations
    5 6 7 -> elements in memory: [13,14,15] [16,17,18] [19,20,21] -> 27 combinations
    

    Results:

    Phase 1:      84 =   84 combinations
    Phase 2:  4 x 64 =  256 combinations
    Phase 3: 10 x 45 =  450 combinations
    Phase 4: 20 x 27 =  540 combinations
                       ----
                       1330 combinations = C(21,3)