For Design of experiment I am trying to do some Data-processing in python using sklearn. Therefore I need to make a multivariate polynomial regression. The code is based on https://saturncloud.io/blog/multivariate-polynomial-regression-with-python/.
So for my specific task I need to "kill" some coefficients. Is it possible to do so? For example let's say I have a degree 2 polynomial:
y = a0 + a1 * x1 + a2 * x2 + a3 * x1 * x2 + a4 * x1^2 + a5 * x2^2
I want to kill x1 * x2 and x2^2 which leads to a3 and a5 being 0. My final function should look like:
y = a0 + a1 * x1 + a2 * x2 + a4 * x1^2
I want to do so before model.fit() so the terms would be ignored while fit. Would be nice if someone had an idea of doing so. I know there is the possibility to use Lasso instead of Linear Regression to reduce coefficients but not in the way I would like to do.
Here is the code:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.datasets import fetch_california_housing
# Load the California housing dataset
california = fetch_california_housing()
df = pd.DataFrame(california.data, columns=california.feature_names)
df['PRICE'] = california.target
# Select appropriate features from the dataset
# Assuming you want to use features like 'MedInc' (median income), 'HouseAge', and 'AveRooms' (average rooms)
X = df[['MedInc', 'HouseAge', 'AveRooms']].values
y = df['PRICE'].values
# Polynomial features
poly = PolynomialFeatures(degree=5)
X_poly = poly.fit_transform(X)
# Linear Regression model
model = LinearRegression()
model.fit(X_poly, y)
# New data with median income, house age, and average number of rooms
new_data = np.array([[3, 20, 5]]) # Example values for median income, house age, and average rooms
new_data_poly = poly.transform(new_data)
# Predicting the price
predicted_price = model.predict(new_data_poly)
print(predicted_price)
You can use np.delete
to remove the indices directly prior to fitting the model.
# Polynomial features
poly = PolynomialFeatures(degree=5)
X_poly = poly.fit_transform(X)
# Remove unwanted terms (e.g., x1 * x2 and x2^2)
exclude_indices = [3, 5]
X_poly_reduced = np.delete(X_poly, exclude_indices, axis=1)
# Linear Regression model
model = LinearRegression()
model.fit(X_poly_reduced, y)