Why does the y-intercept change when all features are used in a linear regression model vs only one feature?

I'm training a linear regression on the "advertising sales dataset"

When I train all of the features(columns) of the dataset together, I get a different y-intercept than if I trained the model on one column only (TV).

Shouldn't the y intercept stay the same?

The weight of the column stays the same.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

data = pd.read_csv('../datasets/Advertising Budget and Sales.csv')

data = data.rename(columns={
    'TV Ad Budget ($)': 'TV',
    'Radio Ad Budget ($)': 'Radio',
    'Newspaper Ad Budget ($)': 'Newspaper',
    'Sales ($)': 'Sales',
    })

data = data.drop(columns=['Unnamed: 0'])

use all columns together

X = data[['TV', 'Radio', 'Newspaper']]
y = data['Sales']

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, test_size=0.3, shuffle=True, random_state=100)


lr = LinearRegression().fit(X_train, y_train)

coeff = lr.coef_
intercept = lr.intercept_

print('coefficients of TV, Radio, and Newspaper:', coeff)
print('y intercept: ',intercept)

coefficients of TV, Radio, and Newspaper: [0.0454256 0.18975773 0.00460308]

y intercept: 2.652789668879496

plt.scatter(data['TV'], data.Sales)
plt.plot(data.TV, intercept + coeff[0] * data.TV);

use the TV feature only

X_train, X_test, y_train, y_test = train_test_split(data['TV'].values.reshape(-1, 1),
                                                    data[['Sales']],
                                                    train_size=0.7, test_size=0.3, shuffle=True, random_state=100)

lr = LinearRegression().fit(X_train, y_train)

coeff_2 = lr.coef_
intercept_2 = lr.intercept_

print('coefficients of TV, Radio, and Newspaper:', coeff_2)
print('y intercept: ', intercept_2)

coefficient of TV: [[0.04649736]]

y intercept: [6.98966586]

plt.scatter(data['TV'], data.Sales)
plt.plot(data.TV, intercept_2 + coeff_2[0][0] * data.TV);

Solution

No, the models will be different, including:

Different number of columns will result in different model weights (coefficients)
Intercepts will usually be different
Prediction results and model explanatory power (e.g. R²) will also be different

Because when you add more features, the model will readjust the contribution of all variables to minimize the overall error, which will also affect the optimal solution for the intercept.

LinearRegression() is a linear regression model, and its learning formula is

enter image description here

y_hat = w0 + w1 * x1 + w2 * x2 + ... + wn * xn

w0 is the intercept
w1 ~ wn is the weight of each feature column
x1 ~ xn features （like: TV、Radio）

therefor, When you change the number of columns in X (that is, the number of features fed into the model), for a linear regression model, it completely affects the learning results of the entire model.

example

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Create sample data set (simulated marketing budget and sales
data = pd.DataFrame({
    'TV': [230.1, 44.5, 17.2, 151.5, 180.8, 8.7, 57.5, 120.2, 8.6, 199.8],
    'Radio': [37.8, 39.3, 45.9, 41.3, 10.8, 48.9, 32.8, 19.6, 2.1, 2.6],
    'Newspaper': [69.2, 45.1, 69.3, 58.5, 58.4, 75.0, 23.5, 11.6, 1.0, 21.2],
    'Sales': [22.1, 10.4, 9.3, 18.5, 12.9, 7.2, 11.8, 13.2, 4.8, 10.6]
})

# Prepare X, y separately (single feature vs multiple features)
X1 = data[['TV']]
X3 = data[['TV', 'Radio', 'Newspaper']]
y = data['Sales']

# Data segmentation (maintain consistency)
X1_train, X1_test, y_train, y_test = train_test_split(X1, y, test_size=0.3, random_state=42)
X3_train, X3_test, _, _ = train_test_split(X3, y, test_size=0.3, random_state=42)

# Build and train the model
model1 = LinearRegression().fit(X1_train, y_train)
model3 = LinearRegression().fit(X3_train, y_train)

# predict
y_pred1 = model1.predict(X1_test)
y_pred3 = model3.predict(X3_test)

# Output comparison
print("Univariate Model：")
print(f"  Intercept: {model1.intercept_:.4f}")
print(f"  TV Coefficient: {model1.coef_[0]:.4f}")
print(f"  R² : {r2_score(y_test, y_pred1):.4f}")

print("\nMultivariate Model：")
print(f"  Intercept: {model3.intercept_:.4f}")
print(f"  Coefficients (TV, Radio, Newspaper): {model3.coef_}")
print(f"  R²: {r2_score(y_test, y_pred3):.4f}")