Suppose I have a RxC contingency table. This means there are R rows and C columns. I want a matrix, X, of dimension RC × (R + C − 2) that contains the R − 1 “main effects” for the rows and the C − 1 “main effects” for the columns.For example, if you have R=C=2 (R = [0, 1], C = [0, 1]) and main effects only, there are various ways to parameterize the design matrix (X), but below is one way:
1 0
0 1
1 0
0 0
Note that this is 4 x 2 = RC x (R + C - 2), you omit one level of each row and one level of each column.
How can I do this in Python for any value of R and C ie R = 3, C = 4 ([0 1 2] and [0 1 2 3])? I only have the values of R and C, but I can use them to construct arrays using np.arange(R)
and np.arange(C)
.
The following should work:
R = 3
C = 2
ir = np.zeros((R, C))
ir[0, :] = 1
ir = ir.ravel()
mat = []
for i in range(R):
mat.append(ir)
ir = np.roll(ir, C)
ic = np.zeros((R, C))
ic[:, 0] = 1
ic = ic.ravel()
for i in range(C):
mat.append(ic)
ic = np.roll(ic, R)
mat = np.asarray(mat).T
and the result is:
array([[ 1., 0., 0., 1., 0.],
[ 1., 0., 0., 0., 1.],
[ 0., 1., 0., 1., 0.],
[ 0., 1., 0., 0., 1.],
[ 0., 0., 1., 1., 0.],
[ 0., 0., 1., 0., 1.]])
Thanks everyone for your help!