I've used sklearn for polynomial ridge regression. Using grid search, I am happy with the results. Now I would like to render it as a simple polynomial equation to run in a small python module. The sklearn function returns the degree and alpha parameter. The latter just sets regularization for training. The former tells me the maximum degree of the resulting equation. But what are the parameters of the equation it has found? I expect the equation to be of the form ax^3 + bx^2 + cx + d, so what are a,b,c,d?
Code for grid search pipeline:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge, Lasso, SGDRegressor
.
.
.
# === 4. Polynomial Ridge ===
poly_pipe = Pipeline([
('scaler', StandardScaler()),
('poly', PolynomialFeatures()),
('ridge', Ridge())
])
poly_params = {
'poly__degree': [2, 3],
'ridge__alpha': [0.01, 0.1, 1]
}
evaluate_model('Polynomial Ridge', poly_pipe, poly_params, is_linear=False)
Based on the comments received there are two things of note:
pandas
for compactness here. Note that I expected coef_
to include the y-intercept but it does not so my code shows the equation with the intercept added in but also including the first coefficient (p[0]) that I expected to be the y-intercept.import pandas as pd
import warnings
# regression libs
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import Ridge
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
# useful initializations
warnings.filterwarnings('ignore')
p = [0, 10, -20, .30]
# Create fake data
def regr_noise(x, p):
mu = np.random.uniform(0,50E6)
return (p[0] + p[1]*x + p[2]*x**2 + p[3]*x**3 + mu)
x = range(0,1000, 50)
df_fake = pd.DataFrame({'x':x})
df_fake['y'] = df_fake['x'].apply(lambda x: regr_noise(x, p))
# polynomial ridge regression
x = df_fake[['x']]
y = df_fake[['y']]
ridge_reg_poly = make_pipeline(PolynomialFeatures(degree=3),Ridge(alpha=0, solver="cholesky"))
temp = ridge_reg_poly.fit(x,y)
yp = temp.predict(x)
# get the coefficients and reproduce the results
p_found = ridge_reg_poly.steps[1][1].coef_
p_interc = ridge_reg_poly.steps[1][1].intercept_
print('coeff:',p_found,'\nintercep:' ,p_interc)
def regr_clean(x, p):
return (p[0] + p[1]*x + p[2]*x**2 + p[3]*x**3 + p_interc)
y_fmcoef = regr_clean(x, p_found).rename(columns={'x':'y'})
yp - y_fmcoef.y