I am using Scikit-learn
to train a classification model. I have both discrete and continuous features in my training data.
I want to do feature selection using mutual information.
The features 1,2 and 3 are discrete. to this end, I try the code below :
mutual_info_classif(x, y, discrete_features=[1, 2, 3])
but it did not work, it gives me the error:
ValueError: could not convert string to float: 'INT'
A simple example with mutual information classifier:
import numpy as np
from sklearn.feature_selection import mutual_info_classif
X = np.array([[0, 0, 0],
[1, 1, 0],
[2, 0, 1],
[2, 0, 1],
[2, 0, 1]])
y = np.array([0, 1, 2, 2, 1])
mutual_info_classif(X, y, discrete_features=True)
# result: array([ 0.67301167, 0.22314355, 0.39575279]