I created a Pipeline with RFE
and RandomForestClassifer
in it and then applied RandomizedSearchCV
to find the best hyperparameter values for both. This is what my code looks like -
from sklearn.esemble_learning import RandomForestClassifier
from sklearn.feature_selection import RFE
from sklearn.pipeline import Pipeline
from sklearn.model_selection import RandomizedSearchCV
steps = [
("rfe", RFE(estimator = RandomForestClassifier(random_state = 42))),
("est", RandomForestClassifier())
]
rf_clf_pl = Pipeline(steps = steps)
params = {
"rfe__n_features_to_select" : range(2, smote_X_train.shape[1] + 1),
"est__random_state" : np.linspace(0, 42, 5).astype(int),
"est__n_estimators" : range(50, 201, 10),
"est__max_depth" : [None] + list(range(5, max_depth, 3)),
"est__max_leaf_nodes" : [None] + list(range(100, max_leaf_nodes, 20))
}
rs = RandomizedSearchCV(estimator = rf_clf_pl, cv = 4, param_distributions = params, n_jobs = -1, n_iter = 100, random_state = 42)
rs.fit(smote_X_train, smote_y_train)
I tried using the code below but got an error -
rf_clf_pl.named_steps["rfe"].support_
Error -
AttributeError Traceback (most recent call last)
<ipython-input-53-c73290f0e090> in <module>()
----> 1 rf_clf_pl.named_steps["rfe"].support_
AttributeError: 'RFE' object has no attribute 'support_'
How can I get the name of the retained features?
You can access the retained features of the best estimator as follows:
rs.best_estimator_.named_steps['rfe'].support_
Namely, you should access the best_estimator_
attribute of the RandomizedSearchCV
fitted instance (i.e. the pipeline re-fitted with the best found hyperparameters thanks to the default parameter refit=True
of RandomizedSearchCV
).
The way you were trying to access attribute support_
from the pipeline instance does not work because you've not explicitly fitted the pipeline itself nor the fitted RandomizedSearchCV
returns the fitted base estimator (despite calling .fit()
on it while running the search) with the exception of the best_estimator_
in the case described above.
Here's an example:
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import RFE
from sklearn.pipeline import Pipeline
from sklearn.model_selection import RandomizedSearchCV, train_test_split
iris = load_iris(as_frame=True)
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test= train_test_split(X, y, random_state=0)
steps = [
("rfe", RFE(estimator = RandomForestClassifier(random_state = 42))),
("est", RandomForestClassifier())
]
rf_clf_pl = Pipeline(steps = steps)
params = {
"rfe__n_features_to_select" : range(2, X_train.shape[1] + 1),
"est__random_state" : np.linspace(0, 42, 5).astype(int),
"est__n_estimators" : range(50, 201, 10),
"est__max_depth" : [None] + list(range(5, 16, 3)),
"est__max_leaf_nodes" : [None] + list(range(100, 201, 20))
}
rs = RandomizedSearchCV(estimator = rf_clf_pl, cv = 4, param_distributions = params, n_jobs = -1, n_iter = 100, random_state = 42)
rs.fit(X_train, y_train)
rs.best_estimator_.named_steps['rfe'].support_
Eventually, if you want to access the explicit names of the retained features, you can retrieve them via rs.feature_names_in_[np.where(rs.best_estimator_.named_steps['rfe'].support_)[0]]
.