pythonpandas

Does Pandas have a similar way to perform exclusion sumif?


In Excel, when I want to sum values across a table, I can perform a sumif if a condition is excluded based on the variable name. Does Pandas have a similar function?

To give more context, I am looking at a confusion matrix and determining the % of classifications that are correct vs. incorrect.

Excel example:

        Fantasy  Horror  RPG
Fantasy   25      3       1
Horror    1       13      0
RPG       11      1       7

 ==(sum(B2:B4) - SUMIF($A2:$A4,"<>"&B$1,B2:B4))/sum(B2:B4)

Results:

       0.68    0.76    0.88

I know of the groupby function, but as of my understanding, it groups based on categorical variables, not through exclusion like what can be done with sumif.

Example:

import pandas as pd


my_data: pd.DataFrame = pd.read_excel("c:/mydata.xlsx")
mydata.groupby("genre").count()

Solution

  • Example Code

    import pandas as pd
    idx = ['Fantasy', 'Horror', 'RPG']
    df = pd.DataFrame([[25,  3,  1], [ 1, 13,  0], [11,  1,  7]], index=idx, columns=idx)
    

    df:

             Fantasy  Horror  RPG
    Fantasy       25       3    1
    Horror         1      13    0
    RPG           11       1    7
    

    Code

    tmp = df.unstack()
    cond = tmp.index.get_level_values(0) == tmp.index.get_level_values(1)
    out = tmp[cond].droplevel(1).div(tmp.groupby(level=0).sum())
    

    out

    Fantasy    0.675676
    Horror     0.764706
    RPG        0.875000
    dtype: float64
    

    If values is a square matrix because index and columns are the same order and the same size, use the following code

    df1 = df.copy()
    np.fill_diagonal(df1.values, 0)
    out = df1.sum().div(df.sum()).rsub(1)
    

    same result but more faster