arrayspandaspairwisecohen-kappa

Pairwise Cohen's Kappa of rows in DataFrame in Pandas (python)


I'd greatly appreciate some help on this. I'm using jupyter notebook.

I have a dataframe where I want calculate the interrater reliability. I want to compare them pairwise by the value of the ID column (all IDs have a frequency of 2, one for each coder). All ID values represent different articles, so I do not want to compare them all together, but more take the average of the interrater reliability of each pair (and potentially also for each column).

N.  ID.     A.  B.      
0   8818313 Yes Yes     1.0 1.0 1.0 1.0 1.0 1.0
1   8818313 Yes No      0.0 1.0 0.0 0.0 1.0 1.0 
2   8820105 No  Yes     0.0 1.0 1.0 1.0 1.0 1.0 
3   8820106 No  No      0.0 0.0 0.0 1.0 0.0 0.0 

I've been able to find some instructions of the cohen's k, but not of how to do this pairwise by value in the ID column.

Does anyone know how to go about this?


Solution

  • Here is how I will approach it:

    from io import StringIO
    from sklearn.metrics import cohen_kappa_score
    
    df = pd.read_csv(StringIO("""
    N,ID,A,B,Nums
    0,   8818313, Yes, Yes,1.0 1.0 1.0 1.0 1.0 1.0
    1,   8818313, Yes, No,0.0 1.0 0.0 0.0 1.0 1.0 
    2,   8820105, No,  Yes,0.0 1.0 1.0 1.0 1.0 1.0 
    3,   8820105, No,  No,0.0 0.0 0.0 1.0 0.0 0.0 """))
    
    
    def kappa(df):
        nums1 = [float(num) for num in df.Nums.iloc[0].split(' ') if num]
        nums2 = [float(num) for num in df.Nums.iloc[1].split(' ') if num]
        return cohen_kappa_score(nums1, nums2)
    
    df.groupby('ID').apply(kappa)
    

    This will generate:

    ID
    8818313    0.000000
    8820105    0.076923
    dtype: float64