pythonmatlabloopscombinationscombinatorics

Calculate products of columns according to combinations with replacement


The Problem

It's a bit difficult to explain but I will try my best. I know the equation to find the number of combinations with replacement. Let's say I have 6 vectors: A, B, C, D, E, F. If I want to find every possible cubic product of these 6 variables, it would be (6+3-1)!/3!(6-1)! = 56 combinations (see end). Similarly, if I want every quadratic product, it's 21. For just linear, of course 6 (just each variable itself). I want to calculate all 6+21+56 = 83 combinations. I am thinking of 3 loops and each inner loop starts iterating from its outer loop like

for i1=1:6
   X(:,?) = X.*X(:,i1)
   for i2=i1:6
      X(:,?) = X.*X(:,i2)
      for i3=i2:6
         X(:,?) = X.*X(:,i3)

But the index of the 83-column matrix to store all the data in the left-hand side is confusing me. They are marked with question marks as you can see.

PS: Might need to do this with 5th order too so it would add another 126 and 252 columns for a total of 461 columns. So a more generic code is better that doesn't hard-code 3rd order. But if it's hard-coded to 5th that's OK since I am definitely not going above that.

Either MATLAB or Python is fine since I can switch easily between both.

The quadratic combinations calculated with an example

Here is an example of the 21 columns I expect for the quadratic combinations of the 6 variables, A through F. Done in Excel. I have taken 3 samples for each vector. enter image description here

The cubic combinations list

Here are the 56 combinations I need to calculate:

A,A,A

A,A,B

A,A,C

A,A,D

A,A,E

A,A,F

A,B,B

A,B,C

A,B,D

A,B,E

A,B,F

A,C,C

A,C,D

A,C,E

A,C,F

A,D,D

A,D,E

A,D,F

A,E,E

A,E,F

A,F,F

B,B,B

B,B,C

B,B,D

B,B,E

B,B,F

B,C,C

B,C,D

B,C,E

B,C,F

B,D,D

B,D,E

B,D,F

B,E,E

B,E,F

B,F,F

C,C,C

C,C,D

C,C,E

C,C,F

C,D,D

C,D,E

C,D,F

C,E,E

C,E,F

C,F,F

D,D,D

D,D,E

D,D,F

D,E,E

D,E,F

D,F,F

E,E,E

E,E,F

E,F,F

F,F,F


Solution

  • This is a vectorized approach in Matlab. It should be fast, but is not memory-efficient, because it generates all Cartesian tuples of coumn indices, and then only keeps those that are non-decreasing.

    x = [2 2 3 2 8 8; 5 1 7 9 4 4; 4 1 2 7 2 9]; % data
    P = 2; % product order
    ind = cell(1,P);
    [ind{end:-1:1}] = ndgrid(1:size(x,2)); % Cartesian power of column indices with order P
    ind = reshape(cat(P+1, ind{:}), [], P); % 2D array where each Cartesian tuple is a row
    ind = ind(all(diff(ind, [], 2)>=0, 2), :); % keep only non-decreasing rows
    result = prod(reshape(x(:,ind.'), size(x,1), P, []), 2); % apply index into data. This
    % creates an intermediate 3D array. Compute products
    result = permute(result, [1 3 2]); % convert to 2D array