[SOLVED] Does Pandas have a similar way to perform exclusion sumif?

Does Pandas have a similar way to perform exclusion sumif?

In Excel, when I want to sum values across a table, I can perform a sumif if a condition is excluded based on the variable name. Does Pandas have a similar function?

To give more context, I am looking at a confusion matrix and determining the % of classifications that are correct vs. incorrect.

Excel example:

        Fantasy  Horror  RPG
Fantasy   25      3       1
Horror    1       13      0
RPG       11      1       7

 ==(sum(B2:B4) - SUMIF($A2:$A4,"<>"&B$1,B2:B4))/sum(B2:B4)

Results:

       0.68    0.76    0.88

I know of the groupby function, but as of my understanding, it groups based on categorical variables, not through exclusion like what can be done with sumif.

Example:

import pandas as pd


my_data: pd.DataFrame = pd.read_excel("c:/mydata.xlsx")
mydata.groupby("genre").count()

Solution

Example Code

import pandas as pd
idx = ['Fantasy', 'Horror', 'RPG']
df = pd.DataFrame([[25,  3,  1], [ 1, 13,  0], [11,  1,  7]], index=idx, columns=idx)

df:

         Fantasy  Horror  RPG
Fantasy       25       3    1
Horror         1      13    0
RPG           11       1    7

Code

tmp = df.unstack()
cond = tmp.index.get_level_values(0) == tmp.index.get_level_values(1)
out = tmp[cond].droplevel(1).div(tmp.groupby(level=0).sum())

out

Fantasy    0.675676
Horror     0.764706
RPG        0.875000
dtype: float64

If values is a square matrix because index and columns are the same order and the same size, use the following code

df1 = df.copy()
np.fill_diagonal(df1.values, 0)
out = df1.sum().div(df.sum()).rsub(1)

same result but more faster