I want to use Recursive feature elimination (RFE) for feature selection on my datase using random forest. I came up with this code:
from sklearn.feature_selection import RFE
# Create the RFE object and rank each pixel
clf_rf_3 = RandomForestClassifier()
rfe = RFE(estimator=clf_rf_3, n_features_to_select=6, step=1)
rfe = rfe.fit(X_train, y_train)
print('Chosen best 5 feature by rfe:',X_train.columns[rfe.support_])
but after execution I got this error:
numpy.ndarray' object has no attribute 'columns'
and it is true because the X_train is a 'numpy.ndarray' and doesn't have columns.
what I want is that to find the name of the selected features. But most codes either give me the number of selected features or the relative column index.
I have tried to replace the code X_train.columns[rfe.support_]
with X_new.columns[rfe.support_]
(in which X_
new is a Dataframe contains all my features before I scale it and split it into train and test) and I got a result. But I'm not sure if I applied a right solution.
This code snippet seems to expect a pandas dataframe. For a numpy array X_train[:, rfe.support_]
should succeed.