I used eli5
to apply the permutation procedure for feature importance. In the documentation, there is some explanation and a small example but it is not clear.
I am using a sklearn SVC
model for a classification problem.
My question is: Are these weights the change (decrease/increase) of the accuracy when the specific feature is shuffled OR is it the SVC weights of these features?
In this medium article, the author states that these values show the reduction in model performance by the reshuffle of that feature. But not sure if that's indeed the case.
Small example:
from sklearn import datasets
import eli5
from eli5.sklearn import PermutationImportance
from sklearn.svm import SVC, SVR
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target
clf = SVC(kernel='linear')
perms = PermutationImportance(clf, n_iter=1000, cv=10, scoring='accuracy').fit(X, y)
print(perms.feature_importances_)
print(perms.feature_importances_std_)
[0.38117333 0.16214 ]
[0.1349115 0.11182505]
eli5.show_weights(perms)
I did some deep research.
After going through the source code here is what I believe for the case where cv
is used and is not prefit
or None
. I use a K-Folds scheme for my application. I also use a SVC model thus, score
is the accuracy in this case.
By looking at the fit
method of thePermutationImportance
object, the _cv_scores_importances
are computed (https://github.com/TeamHG-Memex/eli5/blob/master/eli5/sklearn/permutation_importance.py#L202). The specified cross-validation scheme is used and the base_scores, feature_importances
are returned using the test data (function: _get_score_importances
inside _cv_scores_importances
).
By looking at get_score_importances
function (https://github.com/TeamHG-Memex/eli5/blob/master/eli5/permutation_importance.py#L55), we can see that base_score
is the score on the non shuffled data and feature_importances
(called differently there as: scores_decreases
) are defined as non shuffled score - shuffled score (see https://github.com/TeamHG-Memex/eli5/blob/master/eli5/permutation_importance.py#L93)
Finally, the errors (feature_importances_std_
) are the SD of the above feature_importances
(https://github.com/TeamHG-Memex/eli5/blob/master/eli5/sklearn/permutation_importance.py#L209) and the feature_importances_
is the mean of the above feature_importances
(non-shuffled score minus (-) shuffled score).