I'm trying to implement a perceptron in python using numpy, when using the notation z=XW+b everything is fine. While studying ML I do see though that z=WX+b is also common, especially when talking about neural networks. The problem is that the dimensions of the matrices don't add up, I tried following some answers on the web but the output doesn't have the right dimensions. I tried also asking chatgpt but it only implemented the code following the z=XW+b notation. This is the code I used for z=XW+b:
import numpy as np
n_inpts = 10
in_feats = 5
n_hidden = 8
out_feats = 1
X = np.random.randn(n_inpts,in_feats)
W_x = np.random.randn(in_feats, n_hidden)
bias_h = np.random.randn(1, n_hidden)
H = np.dot(X,W_x) + bias_h
#H is nxh
relu = lambda x: max(0, x)
v_relu = np.vectorize(relu)
H = v_relu(H)
W_h = np.random.randn(n_hidden, out_feats)
bias_o = np.random.randn(1, out_feats)
output = np.dot(H, W_h) + bias_o
Can anybody give me an implementation that gives the same result while using z=WX+b? Every single implementation I found follows the z=XW+b notation. I guess it comes down to how you specify the X and W matrices but as of now I have had no luck finding a solution to my question
Just transpose everything!
In the example I transpose each W and X to get the same distribution of random numbers, so you can compare the result. But you would normally define each variable already transposed as in the commented lines.
import numpy as np
np.random.seed(42) # for reproducible results
n_inpts = 10
in_feats = 5
n_hidden = 8
out_feats = 1
X = np.random.randn(n_inpts, in_feats).T
# X = np.random.randn(in_feats, n_inpts)
W_x = np.random.randn(in_feats, n_hidden).T
# W_x = np.random.randn(n_hidden, in_feats)
bias_h = np.random.randn(n_hidden, 1) # column vector
H = np.dot(W_x, X) + bias_h
#H is nxh
relu = lambda x: max(0, x)
v_relu = np.vectorize(relu)
H = v_relu(H)
W_h = np.random.randn(n_hidden, out_feats).T
# W_h = np.random.randn(out_feats, n_hidden)
bias_o = np.random.randn(out_feats, 1)
output = np.dot(W_h, H) + bias_o
print(output)
Some people like to think the "default" vector as a column. So if you start thinking your perceptron/network from the point of view of one single input example, X is going to have shape (features, 1), the same as the bias. And later observations are added as more columns. It can be pedagogical, but the resulting X is a transposed version of the classic "spreadsheet" representation, with one observation per row and one feature per column.