pythonsplinegamcoefficients

Find spline knots by variable in python


When fitting a linear GAM model in python imposing n_splines=5, a piecewise-linear function is fitted:

import statsmodels.api as sm
from pygam import LinearGAM

data = sm.datasets.get_rdataset('mtcars').data

Y = data['mpg']
X = data.drop("mpg",axis=1)

model = LinearGAM(spline_order=1,n_splines=5).fit(X, Y)

By using .coef from fitted model, the coefficientes for every splines can be recovered for further analysis:

model.coef_

However, how can we obtain the sections of each of the 5 splines for each variable?

As an example, for cyl variable we would fit the following splines:

enter image description here

The 5 sections are determined by the knots, so, in the plot we would see the variable limits for the computed betas. (i.e.:4-5,5-6,6-7,7-8).

The only thing I find in the documentation the method model.edge_knots which is

array-like of floats of length 2. The minimum and maximum domain of the spline function.

In this example it corresponds for cyl to [4,8].


Solution

  • Finally I have come up with a solution, in this one I use partial dependence to calculate the function with its slope changes. In this one I take double differences and with it the change of slope.

     

       

    XX = model_gam.generate_X_grid(term=i)
        pdep, confi = model_gam.partial_dependence(term=i, X=XX, width=0.95)
    
     
    
        
        first_diff = [float("{:.3f}".format(i)) for i in np.diff(pdep)]
        second_diff = abs(np.diff(first_derivative))
        values_list = XX[np.where(second_diff > 0)[0],i]
    

     

    This leads to this result which is suboptimal:

     enter image description here  

    But seems a good enough first apporach.