pythonpandasfeature-selectionuser-warning

what is "UserWarning: No features were selected"


I am using LassoCV() model for feature selection. It is giving me this issue and not selecting any features too. "C:\Users\xyz\Anaconda3\lib\site-packages\sklearn\feature_selection\base.py:80: UserWarning: No features were selected: either the data is too noisy or the selection test too strict. UserWarning)"

The code is given below.

The data is in https://www.kaggle.com/jtrofe/beer-recipes/downloads/recipeData.csv/3

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LassoCV

# dataset URL = https://www.kaggle.com/jtrofe/beer-recipes/downloads/recipeData.csv/3
dataframe = pd.read_csv('Brewer Friend Beer Recipes.csv', encoding = 'latin')
# Encoding the non numerical columns
def encoding_data(dataframe):
    if(dataframe.dtype == 'object'):
        return LabelEncoder().fit_transform(dataframe.astype(str))
    else:
        return dataframe
# Feature Selection using the selected Target Feature
def feature_selection(raw_dataframe, target_feature_list):
    output_list = []
    # preprocessing Converting Categorical data into Numeric Data
    dataframe = raw_dataframe.apply(encoding_data)
    column_list = dataframe.columns.tolist()
    dataframe = dataframe.dropna()
    for target in target_feature_list:
        target_feature = target
        x = dataframe.drop(columns=[target_feature])
        y = dataframe[target_feature].values
        # Lasso feature selection 
        estimator = LassoCV(cv = 3, n_alphas = 1)
        featureselection = SelectFromModel(estimator)
        featureselection.fit(x,y)
        features = featureselection.transform(x)
        feature_list = x.columns[featureselection.get_support()]
        features = ''
        features = ', '.join(feature_list)
        l = (target,features)
        output_list.append(l)
    output_df = pd.DataFrame(output_list,columns = ['Name','Selected Features'])
    print('\nThe Feature Selection is done with the respective target feature(s)')
    return output_df
print(feature_selection(dataframe, ['BrewMethod']))

I am getting this warning and no features are selected.

"C:\Users\xyz\Anaconda3\lib\site-packages\sklearn\feature_selection\base.py:80: UserWarning: No features were selected: either the data is too noisy or the selection test too strict. UserWarning)"

Any idea how to rectify this ?


Solution

  • If no features have been selected you can gradually decrease lambda (or in scikit's case alpha). This will reduce the penalization and probably return some nonzero coefficients.

    It is extremely unusual that no coefficients have been selected. You should think about checking correlations in your data. Maybe you have a lot of collinearity.