I have been trying to use the scikit-learn library to solve this problem. Roughly:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
# Make or load an n x p data matrix X and n x 1 array y of the corresponding
# function values.
poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)
# Approximate the derivatives of the gradient and Hessian using the relevant
# finite-difference equations and model.predict.
As the above illustrates, sklearn
makes the design choice to separate polynomial regression into PolynomialFeatures
and LinearRegression
rather than combine these into a single function. This separation has conceptual advantages but also a major drawback: it effectively prevents model
from offering the methods gradient
and hessian
, and model
would be significantly more useful if it did.
My current work-around uses finite-difference equations and model.predict
to approximate the elements of the gradient and Hessian (as described here). But I don't love this approach — it is sensitive to floating-point error and the "exact" information needed to build the gradient and Hessian is already contained in model.coef_
.
Is there any more elegant or accurate method to fit a p-dimensional polynomial and find its gradient and Hessian within Python? I would be fine with one that uses a different library.
To compute the gradient or the Hessian of a polynomial, one needs to know exponents of variables in each monomial and the corresponding monomial coefficients. The first piece of this information is provided by poly.powers_
, the second by model.coef_
:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np
np.set_printoptions(precision=2, suppress=True)
X = np.arange(6).reshape(3, 2)
y = np.arange(3)
poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)
print("Exponents:")
print(poly.powers_.T)
print("Coefficients:")
print(model.coef_)
This gives:
Exponents:
[[0 1 0 2 1 0]
[0 0 1 0 1 2]]
Coefficients:
[ 0. 0.13 0.13 -0.12 -0. 0.13]
The following function can be then used to compute the gradient at a point given by an array x
:
def gradient(x, powers, coeffs):
x = np.array(x)
gp = np.maximum(0, powers[:, np.newaxis] - np.eye(powers.shape[1], dtype=int))
gp = gp.transpose(1, 2, 0)
gc = coeffs * powers.T
return (((x[:, np.newaxis] ** gp).prod(axis=1)) * gc).sum(axis=1)
For example, we can use it to compute the gradient at the point [0, 1]
:
print(gradient([0, 1], poly.powers_, model.coef_))
This gives:
[0.13 0.38]
The Hessian at a given point can be computed in a similar way.