scikit-learnlogistic-regressionfeature-extractionfeature-selectionrfe

sklearn RFE with logistic regression


I am trying to make a logistic regression model with RFE feature selection.

weights = {0:1, 1:5}
model = LogisticRegression(solver='lbfgs', max_iter=5000, class_weight=weights)
rfe = RFE(model, 25)
rfe_model = rfe.fit(X_train, y_train)
print(rfe_model.support_)
print(selector.ranking_)

And I get:

array([ True,  True,  True,  True,  True, False, False, False, False, False])
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])

How can I use rfe_model.support_ to extract the list of chosen features (subset the data frame) and make a model with only those features (except manually, by making a for loop ad subsetting the list of features itself)? Is there a more elegant way?

Bonus question: Where can I find more info regarding feature selection for logistic regression (not including backward, forwards and stepwise method)?


Solution

  • Use Pipeline for this, like:

    selector = RFE(LogisticRegression(), 25)
    final_clf = SVC()
    rfe_model = Pipeline([("rfe",selector),('model',final_clf)])
    

    Now when you call rfe_model.fit(X,y), Pipeline will first transform the data (i.e. select features) with RFE and send that transformed data to SVC. You can now also use GridSearchCV, cross_validate and all other sorts of built in functions on rfe_model.