python-3.xpandasstatisticscorrelationpearson-correlation

Drop the features that have less correlation with respect to target variable


I have loaded a dataset and tried to find the correlation coefficient with respect to target variable.

Below are the codes:

from sklearn.datasets import load_boston
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns


#Loading the dataset
x = load_boston()
df = pd.DataFrame(x.data, columns = x.feature_names)
df["MEDV"] = x.target
X = df.drop("MEDV",1)   #Feature Matrix
y = df["MEDV"]          #Target Variable
df.head()


#Using Pearson Correlation
plt.figure(figsize=(12,10))
cor = df.corr()
sns.heatmap(cor, annot=True, cmap=plt.cm.Reds)
plt.show()


#Correlation with output variable
cor_target = abs(cor["MEDV"])

#Selecting highly correlated features
relevant_features = cor_target[cor_target>0.4]
print(relevant_features)

How do I drop the features that have correlation coefficient < 0.4?


Solution

  • Try this:

    #Selecting least correlated features
    irelevant_features = cor_target[cor_target<0.4]
    
    # list of irelevant_features
    cols = list([i for i in irelevant_features.index])
    
    #Dropping irelevant_features
    df = df.drop(cols, axis=1)