I have been trying to build a simple neural network myself (3 layers) to predict the MNIST dataset. I referenced some codes online and wrote some parts my own, the code runs without any errors, but something is wrong with the learning process. The trained network always gives me wrong predictions, one or two classes always have very high probability no matter what I pass in as input. I tried to figure out the problem but not making any progress in a few days. Could anyone give me some hints where I did wrong?
import numpy as np
from PIL import Image
import os
np.set_printoptions(formatter={'float_kind':'{:f}'.format})
def init_setup():
#three layers perception
w1=np.random.randn(10,784)-0.8
b1=np.random.rand(10,1)-0.8
#second layer
w2=np.random.randn(10,10)-0.8
b2=np.random.randn(10,1)-0.8
#third layer
w3=np.random.randn(10,10)-0.8
b3=np.random.randn(10,1)-0.8
return w1,b1,w2,b2,w3,b3
def activate(A):
# use ReLU function as the activation function
Z=np.maximum(0,A)
return Z
def softmax(Z):
return np.exp(Z)/np.sum(np.exp(Z))
def forward_propagation(A,w1,b1,w2,b2,w3,b3):
# input A :(784,1)-> A1: (10,1) ->A2: (10,1) -> prob: (10,1)
z1=w1@A+b1
A1=activate(z1)
z2=w2@A1+b2
A2=activate(z2)
z3=w3@A2+b3
prob=softmax(z3)
return z1,A1,z2,A2,z3,prob
def one_hot(Y:np.ndarray)->np.ndarray:
one_hot=np.zeros((10, 1)).astype(int)
one_hot[Y]=1
return one_hot
def back_propagation(A,z1,A1:np.ndarray,z2,A2:np.ndarray,z3,prob,w1,w2:np.ndarray,w3,Y:np.ndarray,lr:float):
m=1/Y.size
dz3=prob-Y
dw3=m*dz3@A2.T
db3= dz3
dz2=ReLU_deriv(z2)*w3.T@dz3
dw2 = dz2@A1.T
db2 = dz2
dz1=ReLU_deriv(z1)*w2.T@dz2
dw1 = dz1@A.T
db1 = dz1
return db1,dw1,dw2,db2,dw3,db3
def ReLU_deriv(Z):
Z[Z>0]=1
Z[Z<=0]=0
return Z
def step(lr,w1,b1,w2,b2,w3,b3,dw1,db1,dw2,db2,dw3,db3):
w1 = w1 - lr * dw1
b1 = b1 - lr * db1
w2 = w2 - lr * dw2
b2 = b2 - lr * db2
w3 = w3 - lr * dw3
b3 = b3 - lr * db3
return w1,b1,w2,b2,w3,b3
put functions together
def learn():
lr=0.5
dir=r'C:\Users\Desktop\MNIST - JPG - training\{}'
w1,b1,w2,b2,w3,b3=init_setup()
for e in range(10):
if e%3 == 0:
lr=lr/10
for num in range(10):
Y=one_hot(num)
# print(Y)
path=dir.format(str(num))
for i in os.listdir(path):
img=Image.open(path+'\\'+i)
A=np.asarray(img)
A=A.reshape(-1,1)
z1,A1,z2,A2,z3,prob=forward_propagation(A,w1,b1,w2,b2,w3,b3)
# print('loss='+str(np.sum(np.abs(Y-prob))))
db1,dw1,dw2,db2,dw3,db3=back_propagation(A,z1,A1,z2,A2,z3,prob,w1,w2,w3,Y,lr)
w1,b1,w2,b2,w3,b3=step(lr,w1,b1,w2,b2,w3,b3,dw1,db1,dw2,db2,dw3,db3)
return w1,b1,w2,b2,w3,b3
optimize_params=learn()
w1,b1,w2,b2,w3,b3=optimize_params
img=Image.open(r'C:\Users\Desktop\MNIST - JPG - training\2\5.jpg')
A=np.asarray(img)
A=A.reshape(-1,1)
z1,A1,z2,A2,z3,prob=forward_propagation(A,w1,b1,w2,b2,w3,b3)
print(prob)
print(np.argmax(prob))
After running the learn function, the network gave me something like this
>>>[[0.040939]
[0.048695]
[0.048555]
[0.054962]
[0.060614]
[0.066957]
[0.086470]
[0.117370]
[0.163163]
[0.312274]]
>>>9
The result is obviously wrong, the true label should really be 2, but as we see on the prob, class 2 has an extremely low value, thus I believe there must be something wrong in the learning process. But I have no clue at all, can someone please give me some hints?
Your current code only trains on labels 0 and 1
for num in range(2):
So there is no way for your model to "know" about any other labels.
Now your model is trained in a very ordered way, and thus has a bias towards last classes. As these are the last ones it saw during training. You should shuffle your training data in each epoch and not feed the network class-wise