In Excel, when I want to sum values across a table, I can perform a sumif if a condition is excluded based on the variable name. Does Pandas have a similar function?
To give more context, I am looking at a confusion matrix and determining the % of classifications that are correct vs. incorrect.
Excel example:
Fantasy Horror RPG
Fantasy 25 3 1
Horror 1 13 0
RPG 11 1 7
==(sum(B2:B4) - SUMIF($A2:$A4,"<>"&B$1,B2:B4))/sum(B2:B4)
Results:
0.68 0.76 0.88
I know of the groupby function, but as of my understanding, it groups based on categorical variables, not through exclusion like what can be done with sumif.
Example:
import pandas as pd
my_data: pd.DataFrame = pd.read_excel("c:/mydata.xlsx")
mydata.groupby("genre").count()
Example Code
import pandas as pd
idx = ['Fantasy', 'Horror', 'RPG']
df = pd.DataFrame([[25, 3, 1], [ 1, 13, 0], [11, 1, 7]], index=idx, columns=idx)
df:
Fantasy Horror RPG
Fantasy 25 3 1
Horror 1 13 0
RPG 11 1 7
Code
tmp = df.unstack()
cond = tmp.index.get_level_values(0) == tmp.index.get_level_values(1)
out = tmp[cond].droplevel(1).div(tmp.groupby(level=0).sum())
out
Fantasy 0.675676
Horror 0.764706
RPG 0.875000
dtype: float64
If values is a square matrix because index and columns are the same order and the same size, use the following code
df1 = df.copy()
np.fill_diagonal(df1.values, 0)
out = df1.sum().div(df.sum()).rsub(1)
same result but more faster