
Why does SequentialFeatureSelector return at most "n_features_in_ - 1" predictors?

I have a training dataset with six features and I am using SequentialFeatureSelector to find an "optimal" subset of the features for a linear regression model. The following code returns three features, which I will call X1, X2, X3.

sfs = SequentialFeatureSelector(LinearRegression(), n_features_to_select='auto', 
                                tol=0.05, direction='forward', 
                                scoring='neg_root_mean_squared_error', cv=8)
sfs.fit_transform(X_train, y_train)

To check the results, I decided to run the same code using the subset of features X1, X2, X3 instead of X_train. I was expecting to see the features X1, X2, X3 returned again, but instead it was only the features X1, X2. Similarly, using these two features again in the same code returned only X1. It seems that the behavior of sfs is always to return a proper subset of the input features with at most n_features_in_ - 1 columns, but I cannot seem to find this information in the scikit-learn docs. Is this correct, and if so, what is the reasoning for not allowing sfs to return the full set of features?

I also checked to see if using backward selection would return a full feature set.

sfs = SequentialFeatureSelector(LinearRegression(), n_features_to_select='auto', 
                                tol=1000, direction='backward', 
                                scoring='neg_root_mean_squared_error', cv=8)
sfs.fit_transform(X_train, y_train)

I set the threshold tol to be a large value in the hope that there would be no satisfactory improvement from the full set of features of X_train. But, instead of returning the six original features, it only returned five. The docs simply state

If the score is not incremented by at least tol between two consecutive feature additions or removals, stop adding or removing.

So it seems that the full feature set is not being considered during cross-validation, and the behavior of sfs is different at the very end of a forward selection or at the very beginning of a backwards selection. If the full set of features outperforms any proper subset of the features, then don't we want sfs to return that possibility? Is there a standard method to compare a selected proper subset of the features and the full set of features using cross-validation?


  • Check the source code, lines 240-46 inside the method fit():

    if self.n_features_to_select == "auto":
        if self.tol is not None:
            # With auto feature selection, `n_features_to_select_` will be updated
            # to `support_.sum()` after features are selected.
            self.n_features_to_select_ = n_features - 1
            self.n_features_to_select_ = n_features // 2

    As can be seen, even with auto selection mode and a given tol, maximum numbers of features that can be added is bounded by n_features - 1 for some reason (may be we can report this issue in github).

    We can override the implementation in the following way, by defining a function get_best_new_feature_score() (similar to the method _get_best_new_feature_score() from the source code), as shown below:

    from sklearn.feature_selection import SequentialFeatureSelector
    from sklearn.model_selection import cross_val_score
    def get_best_new_feature_score(estimator, X, y, cv, current_mask, direction, scoring):
        candidate_feature_indices = np.flatnonzero(~current_mask)
        scores = {}
        for feature_idx in candidate_feature_indices:
            candidate_mask = current_mask.copy()
            candidate_mask[feature_idx] = True
            if direction == "backward":
                candidate_mask = ~candidate_mask
            X_new = X[:, candidate_mask]
            scores[feature_idx] = cross_val_score(
        new_feature_idx = max(scores, key=lambda feature_idx: scores[feature_idx])
        return new_feature_idx, scores[new_feature_idx]

    Now, let's implement the auto (forward) selection, using a regression dataset with 5 features, let' add all the features one-by-one, reporting the improvement in score and stopping by comparing with provided tol:

    from sklearn.datasets import make_regression
    from sklearn.linear_model import LinearRegression
    X, y = make_regression(n_features=5) # data to be used
    # (100, 5)
    lm = LinearRegression() # model to be used
    # now implement 'auto' feature selection (forward selection)   
    cur_mask = np.zeros(X.shape[1]).astype(bool) # no feature selected initially
    cv, direction, scoring = 8, 'forward', 'neg_root_mean_squared_error'
    tol = 1 # if score improvement > tol, feature will be added in forward selection
    old_score = -np.inf
    ids, scores = [], []
    for i in range(X.shape[1]):
        idx, new_score = get_best_new_feature_score(lm, X, y, current_mask=cur_mask, cv=cv, direction=direction, scoring=scoring)
        print(new_score - old_score, tol, score - old_score > tol)
        if (new_score - old_score) > tol:
            cur_mask[idx] = True
            old_score = new_score
            print(f'feature {idx} added, CV score {score}, mask {cur_mask}')
    # feature 3 added, CV score -90.66899644023539, mask [False False False  True False]
    # feature 1 added, CV score -59.21188041830155, mask [False  True False  True False]
    # feature 2 added, CV score -16.709218665372905, mask [False  True  True  True False]
    # feature 4 added, CV score -3.1862116620446166, mask [False  True  True  True  True]
    # feature 0 added, CV score -1.4011801838814216e-13, mask [ True  True  True  True  True]

    enter image description here

    If tol=10, set to 10 instead, then only 4 features will be added in forward-selection. Similarly, if tol=20, then only 3 features will be added in forward-selection, as expected.