pythonmachine-learningpredictioncoefficientslasso-regression

How can I display predictor importance + feature name in multivariate regression?


I'm exploring a dataset with the goal to find any interesting relationships (there are a bunch of variables of interest and I want to see which features or feature combinations predict them).

As a first approach I succesfully computed a multivariate (several target variables) regression with lasso.

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', Lasso())])

search = GridSearchCV(pipeline,
                      {'model__alpha':np.arange(0.1,10,0.1)},
                      cv = 5, scoring="neg_mean_squared_error",verbose=3
                      )
search.fit(X_train,y_train)
search.best_params_
coefficients = search.best_estimator_.named_steps['model'].coef_
importance = np.abs(coefficients)

Now I want to see the importance of the predictors INCLUDING their feature names, cause importance is just a bunch of numbers.

I thought about creating an array with the column names of the features and targets & print the name + the coefficient but my problem is that I'm not entirely be sure how to ensure correspondence (that the correct names are displayed with the correct coefficients). Can anyone help me out?

Here some additional info:

I'm also grateful about any other advice on which importance metrics to use or any suggestions regarding possible analyses.


Solution

  • Coefficients are in the same order of columns of X_train. I would not recommend performing "np.abs" on the coefficients. You are losing precious information on whether they are +ve or -ve. You can keep the sign as it is and visualize it richly. See below.

    I would create a pandas data frame like this:

    import pandas as pd
    pd.options.plotting.backend='plotly' #or use matplotlib
    pdf = pd.DataFrame(data=coefficients, columns=<List of your column names>)
    fig = pdf.T.plot(kind='bar') # T stands for transpose
    fig.plot()
    

    I usually use plotly as the backend as I like its API which is more intuitive and the charts are "interactive"

    Once you get the "fig" options, you can update the traces with color information and others as well. I usually take "percentiles" and then assign colors from Crimson Red to Olive green with a gentle color progression.

    An e.g. Use the qcut function like this:

    pdf['color'] = pd.qcut(
        pdf[<col of interest>], 
        4, 
        labels=['limegreen', 'seagreen', 'green', 'darkgreen']
    ).to_list()
    

    You can use colors of your choice. And pass it to the "marker" property of the Plotly trace.

    marker = {'color' : pdf['color']}, # While setting marker property while invoking add_trace (or) update_trace in a plotly figure