pythonjupyter-notebookmlp

Update Parameters method gives the same initial and updated values - MLP ANN


I have written an MLP ANN code for a binary classification dataset and am getting 0.88 (88%) Accuracy for my training dataset. My Testing dataset gives me 0.37 - 0.55 Accuracy.

I noticed this was due to my parameters not being updated after the UpdateParameters method as shown below:

def update_parameters(parameters, grads, lr):

param1 = parameters
L = len(parameters) // 2

for l in range(L):
    parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - lr * grads["dW"+str(l+1)]        
    parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - lr * grads["db"+str(l+1)]
    
print(param1==parameters)

return parameters

The above function gave me True for all the initial and updated values comparison.

UpdateParameters function is called in the following function:

def ann(X, Y, dimensions, lr, lr_decay, batch_size, epochs, loss, activations, gradient_alg):
L = len(dimensions)             # number of layers in the neural networks
m = X.shape[1] 
costs = []                       # to keep track of the cost 

parameters = initialize_parameters(dimensions)
param1 = parameters

if (gradient_alg == "b"):
    batch_size = X.shape[1]

for i in range(epochs):
    minibatches = random_mini_batches(X, Y, batch_size)
    cost_total = 0
    
    for minibatch in minibatches:
        
        (minibatch_X,minibatch_Y) = minibatch
        last_A, caches = forward_prop_layers(minibatch_X, parameters, activations)
        
        cost_total += compute_cost(last_A, minibatch_Y, loss)
        
        gradients = backward_prop_layers(last_A, minibatch_Y, caches, activations)

        parameters = update_parameters(parameters, gradients, lr)
        
    cost_avg = cost_total /m
        
    if i %10 == 0:
        print ("Cost after epoch %i: %f" %(i, cost_avg))
    costs.append(cost_avg)
            
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('epochs')
plt.title("Learning rate = " + str(lr))
plt.show()

parameters1 = [parameters, param1, dimensions, activations, costs, lr, batch_size]

return parameters1

Is my function not being called properly? Where exactly am I going wrong in my implementation?


Solution

  • Oh yeah, here’s why it’s returning True. First you’re assigning param1 to parameters. Then you’re updating parameters. But since param1 is pointing to parameters, even after updating parameters, param1 still points to the same memory location of parameters. In python everything is treated as an object. Try printing out some parameters before and after updating, and then check manually if they are changing or create a copy of parameters using deepcopy which copies everything in parameters to a separate memory location.

    from copy import deepcopy
    
    def update_parameters(parameters, grads, lr):
        param1 = deepcopy(parameters)
        L = len(parameters) // 2
        for l in range(L):
            parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - lr * grads["dW"+str(l+1)]        
            parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - lr * grads["db"+str(l+1)]
        print(param1==parameters)
        return parameters
    

    Also try printing out the loss after each iteration, If it is changing, then the parameters are getting updated, if not then your parameters aren't getting updated properly.