I am trying to make a logistic regression model with RFE feature selection.
weights = {0:1, 1:5}
model = LogisticRegression(solver='lbfgs', max_iter=5000, class_weight=weights)
rfe = RFE(model, 25)
rfe_model = rfe.fit(X_train, y_train)
print(rfe_model.support_)
print(selector.ranking_)
And I get:
array([ True, True, True, True, True, False, False, False, False, False])
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])
How can I use rfe_model.support_
to extract the list of chosen features (subset the data frame) and make a model with only those features (except manually, by making a for loop ad subsetting the list of features itself)? Is there a more elegant way?
Bonus question: Where can I find more info regarding feature selection for logistic regression (not including backward, forwards and stepwise method)?
Use Pipeline for this, like:
selector = RFE(LogisticRegression(), 25)
final_clf = SVC()
rfe_model = Pipeline([("rfe",selector),('model',final_clf)])
Now when you call rfe_model.fit(X,y)
, Pipeline
will first transform the data (i.e. select features) with RFE
and send that transformed data to SVC
. You can now also use GridSearchCV
, cross_validate
and all other sorts of built in functions on rfe_model
.