I have a supervised learning problem. The final step in the solving process is segmentation. Do features with the lowest MI affect the clustering process?
My problem about Churn Customers Segmentation: I found out some features with no MI at all. Do I drop these features?
You should do a feature important experiment, like this.
https://github.com/ash-wicus-ml/Notebooks/blob/master/XG%20Boost%20-%20Feature%20Importance.ipynb
When you know what your X-variable is, you can run some clustering exercises.
https://github.com/ash-wicus-ml/Notebooks/blob/master/Clustering%20Algorithms%20Compared.ipynb