I have written an MLP ANN code for a binary classification dataset and am getting 0.88
(88%) Accuracy for my training dataset. My Testing dataset gives me 0.37 - 0.55
Accuracy.
I noticed this was due to my parameters not being updated after the UpdateParameters method as shown below:
def update_parameters(parameters, grads, lr):
param1 = parameters
L = len(parameters) // 2
for l in range(L):
parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - lr * grads["dW"+str(l+1)]
parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - lr * grads["db"+str(l+1)]
print(param1==parameters)
return parameters
The above function gave me True
for all the initial and updated values comparison.
UpdateParameters function is called in the following function:
def ann(X, Y, dimensions, lr, lr_decay, batch_size, epochs, loss, activations, gradient_alg):
L = len(dimensions) # number of layers in the neural networks
m = X.shape[1]
costs = [] # to keep track of the cost
parameters = initialize_parameters(dimensions)
param1 = parameters
if (gradient_alg == "b"):
batch_size = X.shape[1]
for i in range(epochs):
minibatches = random_mini_batches(X, Y, batch_size)
cost_total = 0
for minibatch in minibatches:
(minibatch_X,minibatch_Y) = minibatch
last_A, caches = forward_prop_layers(minibatch_X, parameters, activations)
cost_total += compute_cost(last_A, minibatch_Y, loss)
gradients = backward_prop_layers(last_A, minibatch_Y, caches, activations)
parameters = update_parameters(parameters, gradients, lr)
cost_avg = cost_total /m
if i %10 == 0:
print ("Cost after epoch %i: %f" %(i, cost_avg))
costs.append(cost_avg)
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('epochs')
plt.title("Learning rate = " + str(lr))
plt.show()
parameters1 = [parameters, param1, dimensions, activations, costs, lr, batch_size]
return parameters1
Is my function not being called properly? Where exactly am I going wrong in my implementation?
Oh yeah, here’s why it’s returning True. First you’re assigning param1
to parameters
. Then you’re updating parameters
. But since param1
is pointing to parameters
, even after updating parameters
, param1
still points to the same memory location of parameters
. In python everything is treated as an object. Try printing out some parameters
before and after updating, and then check manually if they are changing or create a copy of parameters
using deepcopy which copies everything in parameters
to a separate memory location.
from copy import deepcopy
def update_parameters(parameters, grads, lr):
param1 = deepcopy(parameters)
L = len(parameters) // 2
for l in range(L):
parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - lr * grads["dW"+str(l+1)]
parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - lr * grads["db"+str(l+1)]
print(param1==parameters)
return parameters
Also try printing out the loss after each iteration, If it is changing, then the parameters
are getting updated, if not then your parameters
aren't getting updated properly.