[SOLVED] Implementing a perceptron using numpy

Implementing a perceptron using numpy

I'm trying to implement a perceptron in python using numpy, when using the notation z=XW+b everything is fine. While studying ML I do see though that z=WX+b is also common, especially when talking about neural networks. The problem is that the dimensions of the matrices don't add up, I tried following some answers on the web but the output doesn't have the right dimensions. I tried also asking chatgpt but it only implemented the code following the z=XW+b notation. This is the code I used for z=XW+b:

import numpy as np

n_inpts = 10
in_feats = 5
n_hidden = 8
out_feats = 1

X = np.random.randn(n_inpts,in_feats)

W_x = np.random.randn(in_feats, n_hidden)

bias_h  = np.random.randn(1, n_hidden)
H = np.dot(X,W_x) + bias_h
#H is nxh

relu = lambda x: max(0, x)
v_relu = np.vectorize(relu)

H = v_relu(H)

W_h = np.random.randn(n_hidden, out_feats)

bias_o = np.random.randn(1, out_feats)

output = np.dot(H, W_h) + bias_o

Can anybody give me an implementation that gives the same result while using z=WX+b? Every single implementation I found follows the z=XW+b notation. I guess it comes down to how you specify the X and W matrices but as of now I have had no luck finding a solution to my question

Solution

Just transpose everything!

In the example I transpose each W and X to get the same distribution of random numbers, so you can compare the result. But you would normally define each variable already transposed as in the commented lines.

import numpy as np

np.random.seed(42) # for reproducible results
n_inpts = 10
in_feats = 5
n_hidden = 8
out_feats = 1

X = np.random.randn(n_inpts, in_feats).T
# X = np.random.randn(in_feats, n_inpts)

W_x = np.random.randn(in_feats, n_hidden).T
# W_x = np.random.randn(n_hidden, in_feats)

bias_h  = np.random.randn(n_hidden, 1) # column vector
H = np.dot(W_x, X) + bias_h
#H is nxh

relu = lambda x: max(0, x)
v_relu = np.vectorize(relu)

H = v_relu(H)

W_h = np.random.randn(n_hidden, out_feats).T
# W_h = np.random.randn(out_feats, n_hidden)

bias_o = np.random.randn(out_feats, 1)

output = np.dot(W_h, H) + bias_o
print(output)

Some people like to think the "default" vector as a column. So if you start thinking your perceptron/network from the point of view of one single input example, X is going to have shape (features, 1), the same as the bias. And later observations are added as more columns. It can be pedagogical, but the resulting X is a transposed version of the classic "spreadsheet" representation, with one observation per row and one feature per column.