pythonpandasdataframekruskal-wallis

create vectors for Kruskal-Wallis H-test python


I have dataset as below

df = pd.DataFrame({'numbers':range(9), 'group':['a', 'b', 'c']*3})

 group numbers
0   a   0
1   b   1
2   c   2
3   a   3
4   b   4
5   c   5
6   a   6
7   b   7
8   c   8

i want to create vectors

a = [0, 3, 6]
b = [1, 4, 7]
c = [2, 5, 8]

for Kruskal-Wallis H-test python

stats.kruskal(a, b, c)

or maybe analogue as in R (numbers ~ group)


Solution

  • I'm not familiar with any special requirements of the Kruskal-Wallis test, but you can access these grouped arrays via by putting them into a dictionary this way:

    groupednumbers = {}
    for grp in df['group'].unique(): 
        groupednumbers[grp] = df['numbers'][df['group']==grp].values
    
    print(groupednumbers)
    *** {'c': array([2, 5, 8]), 'b': array([1, 4, 7]), 'a': array([0, 3, 6])}
    

    That is, you'd get your vectors by either explicitly calling groupednumbers['a'] etc., or via a list:

    args = groupednumbers.values()
    

    ... or if you need them in an order:

    args = [groupednumbers[grp] for grp in sorted(df['group'].unique())]
    

    And then call

    stats.kruskal(*args)
    

    Or if you need actual lists, you can do list(df['numbers'][...].values.)