pythonscikit-learnregressionpolynomial-approximations

Ridge Polynomial Regression: How to get parameters for equation found


I've used sklearn for polynomial ridge regression. Using grid search, I am happy with the results. Now I would like to render it as a simple polynomial equation to run in a small python module. The sklearn function returns the degree and alpha parameter. The latter just sets regularization for training. The former tells me the maximum degree of the resulting equation. But what are the parameters of the equation it has found? I expect the equation to be of the form ax^3 + bx^2 + cx + d, so what are a,b,c,d?

Code for grid search pipeline:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import Ridge, Lasso, SGDRegressor
.
.
.
# === 4. Polynomial Ridge ===
poly_pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('poly', PolynomialFeatures()),
    ('ridge', Ridge())
])
poly_params = {
    'poly__degree': [2, 3],
    'ridge__alpha': [0.01, 0.1, 1]
}
evaluate_model('Polynomial Ridge', poly_pipe, poly_params, is_linear=False)

Solution

  • Based on the comments received there are two things of note:

    import pandas as pd
    import warnings
    
    # regression libs
    from sklearn.pipeline import make_pipeline
    from sklearn.linear_model import Ridge
    from sklearn.pipeline import Pipeline
    from sklearn.preprocessing import PolynomialFeatures
    
    # useful initializations
    warnings.filterwarnings('ignore')
    
    p = [0, 10, -20, .30]
    
    # Create fake data 
    def regr_noise(x, p):
        mu = np.random.uniform(0,50E6)
        return (p[0] + p[1]*x + p[2]*x**2 + p[3]*x**3 + mu)
    
    x = range(0,1000, 50)
    df_fake = pd.DataFrame({'x':x})
    df_fake['y'] = df_fake['x'].apply(lambda x: regr_noise(x, p))
    
    # polynomial ridge regression
    x = df_fake[['x']]
    y = df_fake[['y']]
    
    ridge_reg_poly = make_pipeline(PolynomialFeatures(degree=3),Ridge(alpha=0, solver="cholesky"))
    temp = ridge_reg_poly.fit(x,y)
    yp = temp.predict(x)
    
    # get the coefficients and reproduce the results
    p_found = ridge_reg_poly.steps[1][1].coef_
    p_interc = ridge_reg_poly.steps[1][1].intercept_
    print('coeff:',p_found,'\nintercep:' ,p_interc)
    
    def regr_clean(x, p):
        return (p[0] + p[1]*x + p[2]*x**2 + p[3]*x**3 + p_interc)
    
    y_fmcoef = regr_clean(x, p_found).rename(columns={'x':'y'})
    
    yp - y_fmcoef.y