[SOLVED] Gradient descent weights keep getting larger

Gradient descent weights keep getting larger

To get familiar with Gradient Descent algorithm, I tried to create my own Linear Regression model. It works fine for few data points. But when try to fit it using more data, w0 and w1 are always increasing in size. Can some explain this phenomena?

class LinearRegression:
    def __init__(self, x_vector, y_vector):

        self.x_vector = np.array(x_vector, dtype=np.float64)
        self.y_vector = np.array(y_vector, dtype=np.float64)
        self.w0 = 0
        self.w1 = 0

    def _get_predicted_values(self, x):
        formula = lambda x: self.w0 + self.w1 * x
        return formula(x)

    def _get_gradient_matrix(self):
        predictions = self._get_predicted_values(self.x_vector)
        w0_hat = sum((self.y_vector - predictions))
        w1_hat = sum((self.y_vector - predictions) * self.x_vector)

        gradient_matrix = np.array([w0_hat, w1_hat])
        gradient_matrix = -2 * gradient_matrix

        return gradient_matrix

    def fit(self, step_size=0.001, num_iterations=500):
        for _ in range(1, num_iterations):
            gradient_matrix = self._get_gradient_matrix()
            self.w0 -= step_size * (gradient_matrix[0])
            self.w1 -= step_size * (gradient_matrix[1])

    def _show_coeffiecients(self):
        print(f"w0: {self.w0}\tw1: {self.w1}\t")

    def predict(self, x):
        y = self.w0 + self.w1 * x
        return y

# This works fine
x = [x for x in range(-3, 3)]
f = lambda x: 5 * x - 7
y = [f(x_val) for x_val in x]

model = LinearRegression(x, y)
model.fit(num_iterations=3000)

model.show_coeffiecients() #output : w0: -6.99999999999994   w1: 5.00000000000002

#While this doesn't
x = [x for x in range(-50, 50)] # Increased the number of x values
f = lambda x: 5 * x - 7
y = [f(x_val) for x_val in x]

model = LinearRegression(x, y)
model.fit(num_iterations=3000)

model.show_coeffiecients()

The last line produces a warning:

RuntimeWarning: overflow encountered in multiply
w1_hat = sum((self.y_vector - predictions) * self.x_vector)
formula = lambda x: self.w0 + self.w1 * x

Solution

Here can be 2 solutions:

If we are talking about MSE and its derivative then there is one thing missing in your code - division by number of samples. You got quite big gradient values and that can be a reason you cannot reach cost function minimum. So I would recommend you to try this: gradient_matrix = -2 * gradient_matrix / len(self.x_vector)
In case you really want to keep using "(not-normalized) squared error" - decrease step_size value to decrease gradient values and do not miss function minimum