pythonpandas

How to apply a function to all possible tuples of two groups obtained by groupby


I am grouping my data as below:

all_groups = df.groupby('age').groups

Printing all_groups shows:

{1.0: [11, 14, 15, 22], 2.0: [12, 13, 27], 3.0: [16, 17, 19, 20, 23, 24],
6.0: [21], 7.0: [18, 25, 26], 11.0: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}

Now I want to run stats.mannwhitneyu on all possible combinations of two classes. In this example, I have 6 groups, therefor, 15 combinations are possible, e.g., stats.mannwhitneyu(class1, class2), stats.mannwhitneyu(class1, class3), ..., stats.mannwhitneyu(class7, class11).

I need a general approach to do it, specially that I don't know the number of classes in advance. What is the cleanest/smartest way to do it? Thank you in advance.


Solution

  • You could compute a GroupBy object, then apply your test on all combinations:

    from itertools import combinations
    from scipy.stats import mannwhitneyu
    
    groups = df.groupby('age')['value']
    out = pd.DataFrame.from_dict({(a[0], b[0]): mannwhitneyu(a[1], b[1])
                                  for a, b in combinations(groups, 2)},
                                orient='index')
    

    Example:

              statistic    pvalue
    (0, 1)         17.0  0.939860
    (0, 2)         14.0  1.000000
    (0, 3)         61.0  0.205667
    (0, 4)         28.0  0.757692
    (0, 5)         20.0  0.797203
    ...             ...       ...
    (16, 18)        8.0  1.000000
    (16, 19)       13.0  0.380952
    (17, 18)       17.0  0.420635
    (17, 19)       21.0  0.329004
    (18, 19)       18.0  0.662338
    
    [190 rows x 2 columns]
    

    Used input:

    np.random.seed(0)
    df = pd.DataFrame({'age': np.random.randint(0, 20, 100),
                       'value': np.random.random(100)
                      })
    

    If you want a square matrix of pvalues as output, using squareform:

    from scipy.spatial.distance import squareform
    
    idx = sorted(df['age'].unique())
    out = pd.DataFrame(squareform([mannwhitneyu(a[1], b[1]).pvalue
                                   for a, b in combinations(groups, 2)]),
                       index=idx, columns=idx).sort_index().sort_index(axis=1)
    

    Output:

              0         1         2         3         4         5         6         7         8         9         10        11        12        13        14        15        16        17        18        19
    0   0.000000  0.939860  1.000000  0.205667  0.757692  0.797203  0.297702  0.330070  0.863636  0.035964  0.260140  0.727273  0.148252  1.000000  0.898102  0.114161  1.000000  0.898102  0.699301  0.528671
    1   0.939860  0.000000  0.857143  0.187812  0.787879  0.730159  0.285714  0.485714  0.857143  0.066667  0.485714  1.000000  0.200000  1.000000  0.904762  0.163636  1.000000  0.555556  1.000000  0.609524
    2   1.000000  0.857143  0.000000  0.286713  0.666667  0.571429  0.250000  1.000000  1.000000  0.095238  0.400000  1.000000  0.228571  0.800000  1.000000  0.266667  1.000000  0.392857  1.000000  0.714286
    3   0.205667  0.187812  0.286713  0.000000  0.193233  0.055278  0.953047  0.733267  0.468531  0.313187  0.839161  0.216783  0.023976  0.363636  0.439560  0.417318  0.216783  0.055278  0.206460  0.792458
    4   0.757692  0.787879  0.666667  0.193233  0.000000  0.876263  0.267677  0.315152  1.000000  0.073427  0.230303  0.833333  0.315152  0.888889  0.530303  0.164918  1.000000  1.000000  0.431818  0.533800
    5   0.797203  0.730159  0.571429  0.055278  0.876263  0.000000  0.150794  0.190476  1.000000  0.017316  0.063492  0.785714  0.555556  0.857143  0.690476  0.106061  1.000000  1.000000  0.309524  0.246753
    6   0.297702  0.285714  0.250000  0.953047  0.267677  0.150794  0.000000  0.555556  0.571429  0.428571  1.000000  0.250000  0.063492  0.380952  0.309524  0.755051  0.392857  0.095238  0.222222  0.930736
    7   0.330070  0.485714  1.000000  0.733267  0.315152  0.190476  0.555556  0.000000  0.400000  0.114286  0.685714  0.857143  0.028571  0.800000  0.412698  0.527273  0.400000  0.111111  0.555556  0.914286
    8   0.863636  0.857143  1.000000  0.468531  1.000000  1.000000  0.571429  0.400000  0.000000  0.166667  0.628571  1.000000  0.400000  1.000000  0.571429  0.266667  1.000000  1.000000  1.000000  0.904762
    9   0.035964  0.066667  0.095238  0.313187  0.073427  0.017316  0.428571  0.114286  0.166667  0.000000  0.609524  0.047619  0.009524  0.285714  0.051948  0.730769  0.166667  0.004329  0.051948  0.240260
    10  0.260140  0.485714  0.400000  0.839161  0.230303  0.063492  1.000000  0.685714  0.628571  0.609524  0.000000  0.228571  0.057143  0.533333  0.412698  0.927273  0.400000  0.111111  0.285714  0.761905
    11  0.727273  1.000000  1.000000  0.216783  0.833333  0.785714  0.250000  0.857143  1.000000  0.047619  0.228571  0.000000  0.228571  1.000000  0.785714  0.266667  1.000000  0.571429  1.000000  0.714286
    12  0.148252  0.200000  0.228571  0.023976  0.315152  0.555556  0.063492  0.028571  0.400000  0.009524  0.057143  0.228571  0.000000  0.533333  0.063492  0.024242  0.628571  0.285714  0.063492  0.171429
    13  1.000000  1.000000  0.800000  0.363636  0.888889  0.857143  0.380952  0.800000  1.000000  0.285714  0.533333  1.000000  0.533333  0.000000  1.000000  0.333333  0.800000  0.857143  1.000000  0.642857
    14  0.898102  0.904762  1.000000  0.439560  0.530303  0.690476  0.309524  0.412698  0.571429  0.051948  0.412698  0.785714  0.063492  1.000000  0.000000  0.343434  0.785714  0.841270  0.841270  0.792208
    15  0.114161  0.163636  0.266667  0.417318  0.164918  0.106061  0.755051  0.527273  0.266667  0.730769  0.927273  0.266667  0.024242  0.333333  0.343434  0.000000  0.266667  0.073232  0.202020  0.365967
    16  1.000000  1.000000  1.000000  0.216783  1.000000  1.000000  0.392857  0.400000  1.000000  0.166667  0.400000  1.000000  0.628571  0.800000  0.785714  0.266667  0.000000  1.000000  1.000000  0.380952
    17  0.898102  0.555556  0.392857  0.055278  1.000000  1.000000  0.095238  0.111111  1.000000  0.004329  0.111111  0.571429  0.285714  0.857143  0.841270  0.073232  1.000000  0.000000  0.420635  0.329004
    18  0.699301  1.000000  1.000000  0.206460  0.431818  0.309524  0.222222  0.555556  1.000000  0.051948  0.285714  1.000000  0.063492  1.000000  0.841270  0.202020  1.000000  0.420635  0.000000  0.662338
    19  0.528671  0.609524  0.714286  0.792458  0.533800  0.246753  0.930736  0.914286  0.904762  0.240260  0.761905  0.714286  0.171429  0.642857  0.792208  0.365967  0.380952  0.329004  0.662338  0.000000