I have recently exported a table from R's TukeyHSD test to obtain the p-values for various time groups (0, 5, 10, 20, 30, 40, 50, 60). I'm curious if there's a method to transform this into a correlation matrix, where each axis represents the time groups and corresponds to the respective p-value.
The table includes an index indicating the correspondence between the different time groups (e.g., 5-10 or 10-50). I've imported it as a dataframe into Python. Is there a way to rearrange the dataframe as depicted below?
p adj
Groups
50-0 2.815526e-13
60-0 2.855494e-13
20-0 4.764197e-08
50-5 1.712389e-05
50-10 1.483440e-04
50-40 1.643480e-04
60-5 5.873007e-04
60-10 5.218047e-03
60-40 5.613566e-03
10-0 6.878476e-03
40-0 1.270855e-02
20-5 7.380859e-02
50-20 1.574372e-01
40-20 3.264569e-01
20-10 3.369147e-01
5-0 3.816166e-01
60-50 7.301423e-01
60-20 8.503578e-01
10-5 9.731384e-01
40-5 9.820983e-01
40-10 1.000000e+00
I want it to be something like:
0 5 10 20 ...
0 ... ... ... ...
5 ... ... ... ...
10 ... ... ... ...
20 ... ... ... ...
...
I haven't found anything similar online, so I don't know where to start.
Try:
df[["x", "y"]] = df.index.str.split("-", expand=True).to_frame().astype(int).values
print(pd.crosstab(df["x"], df["y"], df["p adj"], aggfunc="first"))
Prints:
y 0 5 10 20 40 50
x
5 3.816166e-01 NaN NaN NaN NaN NaN
10 6.878476e-03 0.973138 NaN NaN NaN NaN
20 4.764197e-08 0.073809 0.336915 NaN NaN NaN
40 1.270855e-02 0.982098 1.000000 0.326457 NaN NaN
50 2.815526e-13 0.000017 0.000148 0.157437 0.000164 NaN
60 2.855494e-13 0.000587 0.005218 0.850358 0.005614 0.730142