pythonmachine-learningscikit-learn

Is it Possible to kill specific coeffients in LinearRegression model?


For Design of experiment I am trying to do some Data-processing in python using sklearn. Therefore I need to make a multivariate polynomial regression. The code is based on https://saturncloud.io/blog/multivariate-polynomial-regression-with-python/.

So for my specific task I need to "kill" some coefficients. Is it possible to do so? For example let's say I have a degree 2 polynomial:

y = a0 + a1 * x1 + a2 * x2 + a3 * x1 * x2 + a4 * x1^2 + a5 * x2^2

I want to kill x1 * x2 and x2^2 which leads to a3 and a5 being 0. My final function should look like:

y = a0 + a1 * x1 + a2 * x2 + a4 * x1^2

I want to do so before model.fit() so the terms would be ignored while fit. Would be nice if someone had an idea of doing so. I know there is the possibility to use Lasso instead of Linear Regression to reduce coefficients but not in the way I would like to do.

Here is the code:

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.datasets import fetch_california_housing

# Load the California housing dataset
california = fetch_california_housing()
df = pd.DataFrame(california.data, columns=california.feature_names)
df['PRICE'] = california.target

# Select appropriate features from the dataset
# Assuming you want to use features like 'MedInc' (median income), 'HouseAge', and 'AveRooms' (average rooms)
X = df[['MedInc', 'HouseAge', 'AveRooms']].values
y = df['PRICE'].values

# Polynomial features
poly = PolynomialFeatures(degree=5)
X_poly = poly.fit_transform(X)

# Linear Regression model
model = LinearRegression()
model.fit(X_poly, y)

# New data with median income, house age, and average number of rooms
new_data = np.array([[3, 20, 5]])  # Example values for median income, house age, and average rooms
new_data_poly = poly.transform(new_data)

# Predicting the price
predicted_price = model.predict(new_data_poly)
print(predicted_price)

Solution

  • You can use np.delete to remove the indices directly prior to fitting the model.

    # Polynomial features
    poly   = PolynomialFeatures(degree=5)
    X_poly = poly.fit_transform(X)
    
    # Remove unwanted terms (e.g., x1 * x2 and x2^2)
    exclude_indices = [3, 5]
    X_poly_reduced  = np.delete(X_poly, exclude_indices, axis=1)
    
    # Linear Regression model
    model = LinearRegression()
    model.fit(X_poly_reduced, y)