When fitting a linear GAM
model in python
imposing n_splines=5, a piecewise-linear function is fitted:
import statsmodels.api as sm
from pygam import LinearGAM
data = sm.datasets.get_rdataset('mtcars').data
Y = data['mpg']
X = data.drop("mpg",axis=1)
model = LinearGAM(spline_order=1,n_splines=5).fit(X, Y)
By using .coef
from fitted model, the coefficientes for every splines can be recovered for further analysis:
model.coef_
However, how can we obtain the sections of each of the 5 splines for each variable?
As an example, for cyl
variable we would fit the following splines:
The 5 sections are determined by the knots, so, in the plot we would see the variable limits for the computed betas. (i.e.:4-5,5-6,6-7,7-8).
The only thing I find in the documentation the method model.edge_knots
which is
array-like of floats of length 2. The minimum and maximum domain of the spline function.
In this example it corresponds for cyl
to [4,8].
Finally I have come up with a solution, in this one I use partial dependence to calculate the function with its slope changes. In this one I take double differences and with it the change of slope.
XX = model_gam.generate_X_grid(term=i)
pdep, confi = model_gam.partial_dependence(term=i, X=XX, width=0.95)
first_diff = [float("{:.3f}".format(i)) for i in np.diff(pdep)]
second_diff = abs(np.diff(first_derivative))
values_list = XX[np.where(second_diff > 0)[0],i]
This leads to this result which is suboptimal:
But seems a good enough first apporach.