I'm trying to get permutation importances for a RandomForestClassifier on a small sample of data, but while I can get simple feature importances, my permutation importances are coming back as all zeros.
This is the code:
Input1:
X_train_encoded = encoder.fit_transform(X_train1)
X_val_encoded = encoder.transform(X_val1)
model = RandomForestClassifier(n_estimators=300, random_state=25,
n_jobs=-1,max_depth=2)
model.fit(X_train_encoded, y_train1)
Output1:
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
criterion='gini', max_depth=2, max_features='auto',
max_leaf_nodes=None, max_samples=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=300,
n_jobs=-1, oob_score=False, random_state=25, verbose=0,
warm_start=False)
Input2:
permuter = PermutationImportance(
model,
scoring='accuracy',
n_iter=3,
random_state=25
)
permuter.fit(X_val_encoded, y_val1)
Output2:
PermutationImportance(cv='prefit',
estimator=RandomForestClassifier(bootstrap=True,
ccp_alpha=0.0,
class_weight=None,
criterion='gini',
max_depth=2,
max_features='auto',
max_leaf_nodes=None,
max_samples=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
min_samples_leaf=1,
min_samples_split=2,
min_weight_fraction_leaf=0.0,
n_estimators=300,
n_jobs=-1,
oob_score=False,
random_state=25,
verbose=0,
warm_start=False),
n_iter=3, random_state=25, refit=True,
scoring='accuracy')
(PROBLEM) Input3:
feature_names = X_val_encoded.columns.tolist()
pd.Series(permuter.feature_importances_, feature_names).sort_values()
(PROBLEM) Output3:
Player 0.0
POS 0.0
ATT 0.0
YDS 0.0
TDS 0.0
REC 0.0
YDS.1 0.0
TDS.1 0.0
FL 0.0
FPTS 0.0
Overall 0.0
pos_adp 0.0
dtype: float64
I expect to get values here, but instead I get zeros - am I doing something wrong or is that a possible result?
In: permuter.feature_importances_
Out:array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Turns out the issue was with the data I was passing in, rather than the code itself.
The data had fewer than 70 observations, so after I was able to add more observations to it (just under 400), I was able to get permutation importances as expected.