pythonpandasdataframescikit-learnmutual-information

How to calculate pairwise Mutual Information for entire pandas dataset?


I have 50 variables in my dataframe. 46 are dependent variables and 4 are independent variables (precipitation, temperature, dew, snow). I want to calculate the mutual information of my dependent variables against my independent.

So in the end I want a dataframe like this enter image description here

Right now I am calculating it using the following but it's taking so long because I have to change my y each time:

X = df[['Temperature', 'Precipitation','Dew','Snow']] # Features
y = df[['N0037']] #target 

from sklearn.feature_selection import mutual_info_regression
mi = mutual_info_regression(X, y)
mi /= np.max(mi)

mi = pd.Series(mi)
mi.index = X.columns
mi.sort_values(ascending=False)
print(mi)

How to calculate pairwise mutual information for the entire dataset?


Solution

  • Using list comprehension:

    indep_vars = ['Temperature', 'Precipitation', 'Dew', 'Snow'] # set independent vars
    dep_vars = df.columns.difference(indep_vars).tolist() # set dependent vars
    
    from sklearn.feature_selection import mutual_info_regression as mi_reg
    
    df_mi = pd.DataFrame([mi_reg(df[indep_vars], df[dep_var]) for dep_var in dep_vars], index = dep_vars, columns = indep_vars).apply(lambda x: x / x.max(), axis = 1)