I have 50 variables in my dataframe. 46 are dependent variables and 4 are independent variables (precipitation, temperature, dew, snow). I want to calculate the mutual information of my dependent variables against my independent.
So in the end I want a dataframe like this
Right now I am calculating it using the following but it's taking so long because I have to change my y each time:
X = df[['Temperature', 'Precipitation','Dew','Snow']] # Features
y = df[['N0037']] #target
from sklearn.feature_selection import mutual_info_regression
mi = mutual_info_regression(X, y)
mi /= np.max(mi)
mi = pd.Series(mi)
mi.index = X.columns
mi.sort_values(ascending=False)
print(mi)
How to calculate pairwise mutual information for the entire dataset?
Using list comprehension:
indep_vars = ['Temperature', 'Precipitation', 'Dew', 'Snow'] # set independent vars
dep_vars = df.columns.difference(indep_vars).tolist() # set dependent vars
from sklearn.feature_selection import mutual_info_regression as mi_reg
df_mi = pd.DataFrame([mi_reg(df[indep_vars], df[dep_var]) for dep_var in dep_vars], index = dep_vars, columns = indep_vars).apply(lambda x: x / x.max(), axis = 1)