pythonscikit-learnfeature-detection

how to apply mutual information on categorical features


I am using Scikit-learn to train a classification model. I have both discrete and continuous features in my training data.

I want to do feature selection using mutual information.

The features 1,2 and 3 are discrete. to this end, I try the code below :

mutual_info_classif(x, y, discrete_features=[1, 2, 3])

but it did not work, it gives me the error:

 ValueError: could not convert string to float: 'INT'

Solution

  • A simple example with mutual information classifier:

    import numpy as np
    from sklearn.feature_selection import mutual_info_classif
    X = np.array([[0, 0, 0],
                  [1, 1, 0],
                  [2, 0, 1],
                  [2, 0, 1],
                  [2, 0, 1]])
    y = np.array([0, 1, 2, 2, 1])
    mutual_info_classif(X, y, discrete_features=True)
    # result: array([ 0.67301167,  0.22314355,  0.39575279]