pythonscikit-learnlinear-regressionpolynomialshessian-matrix

What is the best way to fit a quadratic polynomial to p-dimensional data and compute its gradient and Hessian matrix?


I have been trying to use the scikit-learn library to solve this problem. Roughly:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Make or load an n x p data matrix X and n x 1 array y of the corresponding
# function values.

poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)

# Approximate the derivatives of the gradient and Hessian using the relevant
# finite-difference equations and model.predict.

As the above illustrates, sklearn makes the design choice to separate polynomial regression into PolynomialFeatures and LinearRegression rather than combine these into a single function. This separation has conceptual advantages but also a major drawback: it effectively prevents model from offering the methods gradient and hessian, and model would be significantly more useful if it did.

My current work-around uses finite-difference equations and model.predict to approximate the elements of the gradient and Hessian (as described here). But I don't love this approach — it is sensitive to floating-point error and the "exact" information needed to build the gradient and Hessian is already contained in model.coef_.

Is there any more elegant or accurate method to fit a p-dimensional polynomial and find its gradient and Hessian within Python? I would be fine with one that uses a different library.


Solution

  • To compute the gradient or the Hessian of a polynomial, one needs to know exponents of variables in each monomial and the corresponding monomial coefficients. The first piece of this information is provided by poly.powers_, the second by model.coef_:

    from sklearn.preprocessing import PolynomialFeatures
    from sklearn.linear_model import LinearRegression
    import numpy as np
    
    np.set_printoptions(precision=2, suppress=True)
    
    X = np.arange(6).reshape(3, 2)
    y = np.arange(3)
    poly = PolynomialFeatures(degree=2)
    Xp = poly.fit_transform(X)
    model = LinearRegression()
    model.fit(Xp, y)
    
    print("Exponents:")
    print(poly.powers_.T)
    print("Coefficients:")
    print(model.coef_)
    

    This gives:

    Exponents:
    [[0 1 0 2 1 0]
     [0 0 1 0 1 2]]
    Coefficients:
    [ 0.    0.13  0.13 -0.12 -0.    0.13]
    

    The following function can be then used to compute the gradient at a point given by an array x:

    def gradient(x, powers, coeffs):
        x = np.array(x)
        gp = np.maximum(0, powers[:, np.newaxis] - np.eye(powers.shape[1], dtype=int))
        gp = gp.transpose(1, 2, 0)
        gc = coeffs * powers.T
        return (((x[:, np.newaxis] ** gp).prod(axis=1)) * gc).sum(axis=1)
    

    For example, we can use it to compute the gradient at the point [0, 1]:

    print(gradient([0, 1],  poly.powers_, model.coef_))
    

    This gives:

    [0.13 0.38]
    

    The Hessian at a given point can be computed in a similar way.