I am coding a linear regression code in python,I used the formulas I learnt and checked them up, and also tried normalising the the dataset what happened then is the values of weight and bias changed the exponential increase of values but still there is some error because the bias is still like in in the range of 10^-18 i am using the real estate dataset from kaggle heres the link https://www.kaggle.com/datasets/nitinsharma05/real-estate-analysis-dataset
here is the code i wrote
m -> weight
c -> bias
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def gradient_descent(x, y, m, c, n):
delta_m = (-2/n) * np.sum(x * (y - (m*x + c)))
delta_c = (-2/n) * np.sum(y - (m*x + c))
return delta_m, delta_c
def linear_regression(epochs, m, c, learning_rate, x, y):
n = len(x)
for i in range(epochs):
delta_m, delta_c = gradient_descent(x, y, m, c, n)
m -= learning_rate * delta_m
c -= learning_rate * delta_c
if (i+1) % 1000 == 0:
print(f"After {i+1} epochs: m = {m}, c = {c}")
return m, c
df = pd.read_csv("archive/real_estate.csv")
df = df[['apartment_total_area', 'price_in_USD']].dropna()
df['apartment_total_area'] = (
df['apartment_total_area']
.astype(str)
.str.replace(' m²', '', regex=False)
.str.replace(' ', '', regex=False)
.str.replace(',', '.', regex=False)
.astype(float)
)
# Clean price_in_USD
df['price_in_USD'] = (
df['price_in_USD']
.astype(str)
.str.replace('$', '', regex=False)
.str.replace(',', '', regex=False)
.str.strip()
.astype(float)
)
x = df['apartment_total_area'].to_numpy()
y = df['price_in_USD'].to_numpy()
x_mean, x_std = x.mean(), x.std()
y_mean, y_std = y.mean(), y.std()
x_s = (x - x_mean)/x_std
y_s = (y - y_mean)/y_std
m, c = 0, 0
epochs = 20000
learning_rate = 0.0001
m_s, c_s = linear_regression(epochs, m, c, learning_rate, x_s, y_s)
m_orig = m_s * (y_std / x_std)
c_orig = y_mean + c_s * y_std - m_orig * x_mean
print(f"Final slope (m): {m_orig}")
print(f"Final intercept (c): {c_orig}")
plt.scatter(x, y, color="blue")
plt.plot(x, m_orig*x + c_orig, color="red")
plt.xlabel("Apartment Total Area (m²)")
plt.ylabel("Price in USD")
plt.show()
can someone help me figure out whats the problem?
I also tried changing the learning rate from 1 to 10^-18 but nothing worked and specially the values fo wieghts and biases didnt change this image shows the current slope of linear regression of my code
Your implementation of gradient descent is basically correct — the main issues come from feature scaling and the learning rate. A few key points:
Normalization:
You standardized both x
and y
(x_s
, y_s
), which is fine for training. But then, when you “denormalize” the parameters back, the intercept c_orig
can become very small (close to 0
or 1e-18
) simply because the regression line passes very close to the origin in normalized space. That’s expected, not a bug.
Learning rate:
0.0001
may still be too small for standardized data. Try 0.01
or 0.1
. On the other hand, with unscaled data, large rates will blow up. So:
If you scale → use a larger learning rate.
If you don’t scale → use a smaller one.
Intercept near zero:
That’s normal after scaling. If you train on (x_s, y_s)
, the model is y_s = m_s * x_s + c_s
. When you transform back, c_orig
is adjusted with y_mean
and x_mean
. So even if c_s
≈ 0
, your denormalized model is fine.
Check against sklearn:
Always validate your implementation by comparing with scikit-learn’s LinearRegression
:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(x.reshape(-1, 1), y)
print(model.coef_, model.intercept_)
If your slope and intercept match (up to tiny floating-point differences), your code is working correctly.