pythonscikit-learnfeature-extractionfeature-selectionrfe

Recursive feature elimination (RFE) with random forest


I want to use Recursive feature elimination (RFE) for feature selection on my datase using random forest. I came up with this code:

from sklearn.feature_selection import RFE
# Create the RFE object and rank each pixel
clf_rf_3 = RandomForestClassifier()      
rfe = RFE(estimator=clf_rf_3, n_features_to_select=6, step=1)
rfe = rfe.fit(X_train, y_train)

print('Chosen best 5 feature by rfe:',X_train.columns[rfe.support_])

but after execution I got this error:

numpy.ndarray' object has no attribute 'columns'

and it is true because the X_train is a 'numpy.ndarray' and doesn't have columns.

what I want is that to find the name of the selected features. But most codes either give me the number of selected features or the relative column index.

I have tried to replace the code X_train.columns[rfe.support_] with X_new.columns[rfe.support_] (in which X_ new is a Dataframe contains all my features before I scale it and split it into train and test) and I got a result. But I'm not sure if I applied a right solution.


Solution

  • This code snippet seems to expect a pandas dataframe. For a numpy array X_train[:, rfe.support_] should succeed.