I have a list d_n of n integers and the results Z_klm of a function fun(dk, dl, dm) for all binom(n, 3) combinations without repetition (k, l, m) out of d_n indices.
Now, for all binom(n, 2) combinations without repetitions (s, t) of d_n indices, I need to take the T_st partial sums of Z_klm where (s, t) is a subset of (k, l, m).
Here's an example with a small n.
import numpy as np
from itertools import combinations
from scipy.special import binom
# generate random data
n = 5
d_n = np.random.randint(-30, 30, 5)
# define the function
def fun(dk, dl, dm):
return np.sign(dk - 2*dl + dm)
# calculate Z_klm and store (k,l,m) indices
klm = []
Z_klm = []
for k, l, m in combinations(range(n), 3):
Z_klm.append(fun(d_n[k], d_n[l], d_n[m]))
klm.append([k, l, m])
# calculate the partial sums T_st
st = {}
T_st = np.zeros(shape=int(binom(n, 2)))
for h, (s, t) in enumerate(combinations(range(n), 2)):
st.update({f"({s},{t})": []})
for i, _klm_ in enumerate(klm):
if s in _klm_ and t in _klm_:
T_st[h] += Z_klm[i]
st[f"({s},{t})"].append(_klm_)
T_st, st
(array([ 1., 1., 2., 2., 1., 1., 1., 1., 1., -2.]),
{'(0,1)': [[0, 1, 2], [0, 1, 3], [0, 1, 4]],
'(0,2)': [[0, 1, 2], [0, 2, 3], [0, 2, 4]],
'(0,3)': [[0, 1, 3], [0, 2, 3], [0, 3, 4]],
'(0,4)': [[0, 1, 4], [0, 2, 4], [0, 3, 4]],
'(1,2)': [[0, 1, 2], [1, 2, 3], [1, 2, 4]],
'(1,3)': [[0, 1, 3], [1, 2, 3], [1, 3, 4]],
'(1,4)': [[0, 1, 4], [1, 2, 4], [1, 3, 4]],
'(2,3)': [[0, 2, 3], [1, 2, 3], [2, 3, 4]],
'(2,4)': [[0, 2, 4], [1, 2, 4], [2, 3, 4]],
'(3,4)': [[0, 3, 4], [1, 3, 4], [2, 3, 4]]})
For example T_{2,4} is the sum of Z_{0,2,4}, Z_{1,2,4}, and Z_{2,3,4}.
My rough implementation is working only because there are very few observations here. But in case of real sample sizes (can usually be up to n = 1000), it would take a lifetime to iterate all over the binom(n,2) and the binom(n,3).
Any suggestions to speed it up with an efficient algorithm or more advanced iteration tools?
P.S.: I do not need to store all (k, l, m) and (s, t) indices; I only did it here to demonstrate how it works and to implement the algorithm for calculating the T_st partial sums.
For large n (like 1000), you will want to avoid the nested loop structure. Here is my recommended code for better performance with n=1000:
def compute_partial_sums_optimal(d_n, fun):
n = len(d_n)
pairs_list = list(combinations(range(n), 2))
n_pairs = len(pairs_list)
T_st = np.zeros(n_pairs)
# For each pair (s,t), sum over all third indices
for i, (s, t) in enumerate(pairs_list):
# All triples containing (s,t)
for r in range(n):
if r != s and r != t:
k, l, m = sorted([s, t, r])
T_st[i] += fun(d_n[k], d_n[l], d_n[m])
return T_st
# Benchmark
import time
n = 100 # Start with n=100 for testing
d_n = np.random.randint(-30, 30, n)
start = time.time()
result = compute_partial_sums_optimal(d_n, fun)
print(f"Time for n={n}: {time.time() - start:.3f}s")