pythonpandascorrelationtukey

How to convert R's Tukey's HSD table into correlation matrix in Python using Pandas


I have recently exported a table from R's TukeyHSD test to obtain the p-values for various time groups (0, 5, 10, 20, 30, 40, 50, 60). I'm curious if there's a method to transform this into a correlation matrix, where each axis represents the time groups and corresponds to the respective p-value.

The table includes an index indicating the correspondence between the different time groups (e.g., 5-10 or 10-50). I've imported it as a dataframe into Python. Is there a way to rearrange the dataframe as depicted below?

        p adj
Groups  
50-0    2.815526e-13
60-0    2.855494e-13
20-0    4.764197e-08
50-5    1.712389e-05
50-10   1.483440e-04
50-40   1.643480e-04
60-5    5.873007e-04
60-10   5.218047e-03
60-40   5.613566e-03
10-0    6.878476e-03
40-0    1.270855e-02
20-5    7.380859e-02
50-20   1.574372e-01
40-20   3.264569e-01
20-10   3.369147e-01
5-0 3.816166e-01
60-50   7.301423e-01
60-20   8.503578e-01
10-5    9.731384e-01
40-5    9.820983e-01
40-10   1.000000e+00

I want it to be something like:

        0    5    10    20    ...
0       ...  ...  ...  ...
5       ...  ...  ...  ...
10      ...  ...  ...  ...
20      ...  ...  ...  ...
...

I haven't found anything similar online, so I don't know where to start.


Solution

  • Try:

    df[["x", "y"]] = df.index.str.split("-", expand=True).to_frame().astype(int).values
    print(pd.crosstab(df["x"], df["y"], df["p adj"], aggfunc="first"))
    
    

    Prints:

    y             0         5         10        20        40        50
    x                                                                 
    5   3.816166e-01       NaN       NaN       NaN       NaN       NaN
    10  6.878476e-03  0.973138       NaN       NaN       NaN       NaN
    20  4.764197e-08  0.073809  0.336915       NaN       NaN       NaN
    40  1.270855e-02  0.982098  1.000000  0.326457       NaN       NaN
    50  2.815526e-13  0.000017  0.000148  0.157437  0.000164       NaN
    60  2.855494e-13  0.000587  0.005218  0.850358  0.005614  0.730142