python scikit-learn regression polynomial-approximations

Ridge Polynomial Regression: How to get parameters for equation found

I've used sklearn for polynomial ridge regression. Using grid search, I am happy with the results. Now I would like to render it as a simple polynomial equation to run in a small python module. The sklearn function returns the degree and alpha parameter. The latter just sets regularization for training. The former tells me the maximum degree of the resulting equation. But what are the parameters of the equation it has found? I expect the equation to be of the form ax^3 + bx^2 + cx + d, so what are a,b,c,d?

Code for grid search pipeline:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge, Lasso, SGDRegressor
.
.
.
# === 4. Polynomial Ridge ===
poly_pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('poly', PolynomialFeatures()),
    ('ridge', Ridge())
])
poly_params = {
    'poly__degree': [2, 3],
    'ridge__alpha': [0.01, 0.1, 1]
}
evaluate_model('Polynomial Ridge', poly_pipe, poly_params, is_linear=False)

Solution

Based on the comments received there are two things of note:

if you use a pipeline then to reproduce the results outside of sklearn you must replicate the steps in the pipeline
the coefficients are available in two parts: coef_ and intercept_. The coefficients are those for the equation and intercept is the y-intercept. These are for both linear and nonlinear regression. The code below breaks down a polynomial regression and shows how to reproduce the result with "just python" ... well pandas for compactness here. Note that I expected coef_ to include the y-intercept but it does not so my code shows the equation with the intercept added in but also including the first coefficient (p[0]) that I expected to be the y-intercept.

import pandas as pd
import warnings

# regression libs
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import Ridge
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures

# useful initializations
warnings.filterwarnings('ignore')

p = [0, 10, -20, .30]

# Create fake data 
def regr_noise(x, p):
    mu = np.random.uniform(0,50E6)
    return (p[0] + p[1]*x + p[2]*x**2 + p[3]*x**3 + mu)

x = range(0,1000, 50)
df_fake = pd.DataFrame({'x':x})
df_fake['y'] = df_fake['x'].apply(lambda x: regr_noise(x, p))

# polynomial ridge regression
x = df_fake[['x']]
y = df_fake[['y']]

ridge_reg_poly = make_pipeline(PolynomialFeatures(degree=3),Ridge(alpha=0, solver="cholesky"))
temp = ridge_reg_poly.fit(x,y)
yp = temp.predict(x)

# get the coefficients and reproduce the results
p_found = ridge_reg_poly.steps[1][1].coef_
p_interc = ridge_reg_poly.steps[1][1].intercept_
print('coeff:',p_found,'\nintercep:' ,p_interc)

def regr_clean(x, p):
    return (p[0] + p[1]*x + p[2]*x**2 + p[3]*x**3 + p_interc)

y_fmcoef = regr_clean(x, p_found).rename(columns={'x':'y'})

yp - y_fmcoef.y