pythoncombinationspython-itertoolscombinatorics

Advanced combinatorics in Python: binom(n,2) subsets of binom(n,3) combinations without repetition


I have a list d_n of n integers and the results Z_klm of a function fun(dk, dl, dm) for all binom(n, 3) combinations without repetition (k, l, m) out of d_n indices.

Now, for all binom(n, 2) combinations without repetitions (s, t) of d_n indices, I need to take the T_st partial sums of Z_klm where (s, t) is a subset of (k, l, m).

Here's an example with a small n.

import numpy as np
from itertools import combinations
from scipy.special import binom

# generate random data
n = 5
d_n = np.random.randint(-30, 30, 5)

# define the function
def fun(dk, dl, dm):
    return np.sign(dk - 2*dl + dm)

# calculate Z_klm and store (k,l,m) indices
klm = []
Z_klm = []
for k, l, m in combinations(range(n), 3):
    Z_klm.append(fun(d_n[k], d_n[l], d_n[m]))
    klm.append([k, l, m])

# calculate the partial sums T_st
st = {}
T_st = np.zeros(shape=int(binom(n, 2)))
for h, (s, t) in enumerate(combinations(range(n), 2)):
    st.update({f"({s},{t})": []})
    for i, _klm_ in enumerate(klm):
        if s in _klm_ and t in _klm_:
            T_st[h] += Z_klm[i]
            st[f"({s},{t})"].append(_klm_)

T_st, st
(array([ 1.,  1.,  2.,  2.,  1.,  1.,  1.,  1.,  1., -2.]),

 {'(0,1)': [[0, 1, 2], [0, 1, 3], [0, 1, 4]],
  '(0,2)': [[0, 1, 2], [0, 2, 3], [0, 2, 4]],
  '(0,3)': [[0, 1, 3], [0, 2, 3], [0, 3, 4]],
  '(0,4)': [[0, 1, 4], [0, 2, 4], [0, 3, 4]],
  '(1,2)': [[0, 1, 2], [1, 2, 3], [1, 2, 4]],
  '(1,3)': [[0, 1, 3], [1, 2, 3], [1, 3, 4]],
  '(1,4)': [[0, 1, 4], [1, 2, 4], [1, 3, 4]],
  '(2,3)': [[0, 2, 3], [1, 2, 3], [2, 3, 4]],
  '(2,4)': [[0, 2, 4], [1, 2, 4], [2, 3, 4]],
  '(3,4)': [[0, 3, 4], [1, 3, 4], [2, 3, 4]]})

For example T_{2,4} is the sum of Z_{0,2,4}, Z_{1,2,4}, and Z_{2,3,4}.

My rough implementation is working only because there are very few observations here. But in case of real sample sizes (can usually be up to n = 1000), it would take a lifetime to iterate all over the binom(n,2) and the binom(n,3).

Any suggestions to speed it up with an efficient algorithm or more advanced iteration tools?

P.S.: I do not need to store all (k, l, m) and (s, t) indices; I only did it here to demonstrate how it works and to implement the algorithm for calculating the T_st partial sums.


Solution

  • For large n (like 1000), you will want to avoid the nested loop structure. Here is my recommended code for better performance with n=1000:

    def compute_partial_sums_optimal(d_n, fun):
        n = len(d_n)
        pairs_list = list(combinations(range(n), 2))
        n_pairs = len(pairs_list)
        
        T_st = np.zeros(n_pairs)
        
        # For each pair (s,t), sum over all third indices
        for i, (s, t) in enumerate(pairs_list):
            # All triples containing (s,t)
            for r in range(n):
                if r != s and r != t:
                    k, l, m = sorted([s, t, r])
                    T_st[i] += fun(d_n[k], d_n[l], d_n[m])
        
        return T_st
    
    # Benchmark
    import time
    
    n = 100  # Start with n=100 for testing
    d_n = np.random.randint(-30, 30, n)
    
    start = time.time()
    result = compute_partial_sums_optimal(d_n, fun)
    print(f"Time for n={n}: {time.time() - start:.3f}s")