I was following Andre Ng's Course on Natural Language Processing, Week 1 and trying to find the components of a function that is calculating a gradient Descent.
The GradientDescent function is given as such:
# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def gradientDescent(x, y, theta, alpha, num_iters):
'''
Input:
x: matrix of features which is (m,n+1)
y: corresponding labels of the input matrix x, dimensions (m,1)
theta: weight vector of dimension (n+1,1)
alpha: learning rate
num_iters: number of iterations you want to train your model for
Output:
J: the final cost
theta: your final weight vector
Hint: you might want to print the cost to make sure that it is going down.
'''
### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###
# get 'm', the number of rows in matrix x
m = None
for i in range(0, num_iters):
# get z, the dot product of x and theta
z = None
# get the sigmoid of z
h = None
# calculate the cost function
J = None
# update the weights theta
theta = None
### END CODE HERE ###
J = float(J)
return J, theta
To test the Functions' accuracy, the following test data are provided:
# Check the function
# Construct a synthetic test case using numpy PRNG functions
np.random.seed(1)
# X input is 10 x 3 with ones for the bias terms
tmp_X = np.append(np.ones((10, 1)), np.random.rand(10, 2) * 2000, axis=1)
# Y Labels are 10 x 1
tmp_Y = (np.random.rand(10, 1) > 0.35).astype(float)
# Apply gradient descent
tmp_J, tmp_theta = gradientDescent(tmp_X, tmp_Y, np.zeros((3, 1)), 1e-8, 700)
print(f"The cost after training is {tmp_J:.8f}.")
print(f"The resulting vector of weights is {[round(t, 8) for t in np.squeeze(tmp_theta)]}")
When all the components of the underlying equations are entered correctly, the expected output is the following:
The cost after training is 0.67094970.
The resulting vector of weights is [4.1e-07, 0.00035658, 7.309e-05]
Technically, the cost function 𝐽 is calculated by taking the dot product of the vectors 'y' and 'log(h)'. Since both 'y' and 'h' are column vectors (m,1), transpose the vector to the left, so that matrix multiplication of a row vector with column vector performs the dot product.
𝐽=−1𝑚×(𝐲𝑇⋅𝑙𝑜𝑔(𝐡)+(1−𝐲)𝑇⋅𝑙𝑜𝑔(1−𝐡))
In my effort, I was able to derive 99% of the equations, except the Cost function, which is generating a higher value than the expected, such that, my current cost function is generating a value of 0.81721852, while the expected value from the test variables generates a value of 0.67094970.
# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def gradientDescent(x, y, theta, alpha, num_iters):
'''
Input:
x: matrix of features which is (m,n+1)
y: corresponding labels of the input matrix x, dimensions (m,1)
theta: weight vector of dimension (n+1,1)
alpha: learning rate
num_iters: number of iterations you want to train your model for
Output:
J: the final cost
theta: your final weight vector
Hint: you might want to print the cost to make sure that it is going down.
'''
### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###
# get 'm', the number of rows in matrix x
m = len(x)
xT = x.transpose()
for i in range(0, num_iters):
# get z, the dot product of x and theta
z = np.dot(x, theta)
# get the sigmoid of z
h = sigmoid(z)
# calculate the loss
# loss = (h - y)
# calculate the gradient
# gradient = np.dot(xT, loss)
# calculate the cost function
J = np.sum((np.log(h) - y) ** 2) / (2 * m)
#print("Iters %d | J: %f" % (i, J))
# update the weights theta
theta = theta - ( (alpha/m) * np.dot(xT, (h - y)) )
### END CODE HERE ###
J = float(J)
return J, theta
How do I modify my equation variables to derive the correct expected value of 0.67094970, instead of what I am getting now, i.e. 0.81721852?
# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def gradientDescent(x, y, theta, alpha, num_iters):
'''
Input:
x: matrix of features which is (m,n+1)
y: corresponding labels of the input matrix x, dimensions (m,1)
theta: weight vector of dimension (n+1,1)
alpha: learning rate
num_iters: number of iterations you want to train your model for
Output:
J: the final cost
theta: your final weight vector
Hint: you might want to print the cost to make sure that it is going down.
'''
### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###
# get 'm', the number of rows in matrix x
m = len(x)
xT = x.transpose()
yT = y.transpose()
for i in range(0, num_iters):
# get z, the dot product of x and theta
z = np.dot(x, theta)
# get the sigmoid of z
h = sigmoid(z)
# calculate the cost function
J = - (1/m) * (yT.dot(np.log(h)) + (1-yT).dot(np.log(1-h)))
# update the weights theta
theta = theta - ((alpha/m) * xT.dot(h - y))
### END CODE HERE ###
J = float(J)
return J, theta