pythonnumpygradient-descentsigmoidtheta360

Need help in generating and the cost for a gradient descent function


I was following Andre Ng's Course on Natural Language Processing, Week 1 and trying to find the components of a function that is calculating a gradient Descent.

The GradientDescent function is given as such:

# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def gradientDescent(x, y, theta, alpha, num_iters):
    '''
    Input:
        x: matrix of features which is (m,n+1)
        y: corresponding labels of the input matrix x, dimensions (m,1)
        theta: weight vector of dimension (n+1,1)
        alpha: learning rate
        num_iters: number of iterations you want to train your model for
    Output:
        J: the final cost
        theta: your final weight vector
    Hint: you might want to print the cost to make sure that it is going down.
    '''
    ### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###
    # get 'm', the number of rows in matrix x
    m = None
    
    for i in range(0, num_iters):
        
        # get z, the dot product of x and theta
        z = None
        
        # get the sigmoid of z
        h = None
        
        # calculate the cost function
        J = None

        # update the weights theta
        theta = None
        
    ### END CODE HERE ###
    J = float(J)
    return J, theta

To test the Functions' accuracy, the following test data are provided:

# Check the function
# Construct a synthetic test case using numpy PRNG functions
np.random.seed(1)
# X input is 10 x 3 with ones for the bias terms
tmp_X = np.append(np.ones((10, 1)), np.random.rand(10, 2) * 2000, axis=1)
# Y Labels are 10 x 1
tmp_Y = (np.random.rand(10, 1) > 0.35).astype(float)

# Apply gradient descent
tmp_J, tmp_theta = gradientDescent(tmp_X, tmp_Y, np.zeros((3, 1)), 1e-8, 700)
print(f"The cost after training is {tmp_J:.8f}.")
print(f"The resulting vector of weights is {[round(t, 8) for t in np.squeeze(tmp_theta)]}")

When all the components of the underlying equations are entered correctly, the expected output is the following:

The cost after training is 0.67094970.
The resulting vector of weights is [4.1e-07, 0.00035658, 7.309e-05]

Technically, the cost function  𝐽  is calculated by taking the dot product of the vectors 'y' and 'log(h)'. Since both 'y' and 'h' are column vectors (m,1), transpose the vector to the left, so that matrix multiplication of a row vector with column vector performs the dot product.
                   𝐽=−1𝑚×(𝐲𝑇⋅𝑙𝑜𝑔(𝐡)+(1−𝐲)𝑇⋅𝑙𝑜𝑔(1−𝐡))

In my effort, I was able to derive 99% of the equations, except the Cost function, which is generating a higher value than the expected, such that, my current cost function is generating a value of 0.81721852, while the expected value from the test variables generates a value of 0.67094970.

# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def gradientDescent(x, y, theta, alpha, num_iters):
    '''
    Input:
        x: matrix of features which is (m,n+1)
        y: corresponding labels of the input matrix x, dimensions (m,1)
        theta: weight vector of dimension (n+1,1)
        alpha: learning rate
        num_iters: number of iterations you want to train your model for
    Output:
        J: the final cost
        theta: your final weight vector
    Hint: you might want to print the cost to make sure that it is going down.
    '''
    ### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###
    # get 'm', the number of rows in matrix x
    m = len(x)
    xT = x.transpose()
    for i in range(0, num_iters):
        
        # get z, the dot product of x and theta
        z = np.dot(x, theta)
        
        # get the sigmoid of z
        h = sigmoid(z)
        
        # calculate the loss
        # loss = (h - y)
        
        # calculate the gradient
        # gradient = np.dot(xT, loss)
        
        # calculate the cost function
        J = np.sum((np.log(h) - y) ** 2) / (2 * m)     
        #print("Iters %d | J: %f" % (i, J))
    
        # update the weights theta
        theta = theta - ( (alpha/m) *  np.dot(xT, (h - y)) )
        
    ### END CODE HERE ###
    J = float(J)
    return J, theta

How do I modify my equation variables to derive the correct expected value of 0.67094970, instead of what I am getting now, i.e. 0.81721852?


Solution

  • # UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
    def gradientDescent(x, y, theta, alpha, num_iters):
        '''
        Input:
            x: matrix of features which is (m,n+1)
            y: corresponding labels of the input matrix x, dimensions (m,1)
            theta: weight vector of dimension (n+1,1)
            alpha: learning rate
            num_iters: number of iterations you want to train your model for
        Output:
            J: the final cost
            theta: your final weight vector
        Hint: you might want to print the cost to make sure that it is going down.
        '''
        ### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###
        # get 'm', the number of rows in matrix x
        m = len(x)
        xT = x.transpose()
        yT = y.transpose()
        for i in range(0, num_iters):
            
            # get z, the dot product of x and theta
            z = np.dot(x, theta)
            
            # get the sigmoid of z
            h = sigmoid(z)        
            
            # calculate the cost function
            J = - (1/m) * (yT.dot(np.log(h)) + (1-yT).dot(np.log(1-h)))
        
            # update the weights theta
            theta = theta - ((alpha/m) *  xT.dot(h - y))
            
        ### END CODE HERE ###
        J = float(J)
        return J, theta