I have a dateset with 4 conditions (A, B, C, D). What I observed running a One-Way Anova is that there is a linear increase of my dependent variable (Reaction Time, RT) in the 4 conditions.
I would like to run a post-hoc test to see if the increases of RT from A to B, from B to C, and C to D are significant with a Tukey HSD post-hoc test.
To run the test in Python, I am using the following code:
#Multiple Comparison of Means - Tukey HSD
from statsmodels.stats.multicomp import pairwise_tukeyhsd
print(pairwise_tukeyhsd(df["RT"], df['Cond']))
The problem I am facing is that here it is assumed that I am interested in all possible comparisons (A vs B, A vs C, A vs D, B vs C, B vs D, C vs D). Thus, the correction applied is based on 6 tests. However, I am only making hypothesis on 3 comparisons (A vs B, B vs C, C vs D).
How can I inform the post-hoc test about the number/type of comparisons I am interested in?
Unfortunately you cannot. Tukey HSD is not like your pairwise t test with a multiple comparison adjustment on the raw p-values. The p value you see is based on the studentized range (q) distribution.
One way you can do this is to fit a linear model, which is like your anova, and you do a pairwise t-test on the coefficients, and subset on those that you need.
To illustrate this, I use some simulated data, this is what TukeyHSD would look like:
import pandas as pd
import numpy as np
from statsmodels.formula.api import ols
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.stats.multitest import multipletests
np.random.seed(123)
df = pd.DataFrame({'RT':np.random.randn(100),'Cond':np.random.choice(['A','B','C','D'],100)})
hs_res=pairwise_tukeyhsd(df["RT"], df['Cond'])
print(hs_res)
Multiple Comparison of Means - Tukey HSD, FWER=0.05
===================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------
A B -0.6598 0.2428 -1.5767 0.2571 False
A C -0.3832 0.6946 -1.3334 0.567 False
A D -0.634 0.2663 -1.5402 0.2723 False
B C 0.2766 0.7861 -0.5358 1.0891 False
B D 0.0258 0.9 -0.7347 0.7864 False
C D -0.2508 0.8257 -1.0513 0.5497 False
---------------------------------------------------
Now we do ols, and you can see it is pretty comparable :
res = ols("RT ~ Cond", df).fit()
pw = res.t_test_pairwise("Cond",method="sh")
pw.result_frame
coef std err t P>|t| Conf. Int. Low Conf. Int. Upp. pvalue-sh reject-sh
B-A -0.659798 0.350649 -1.881645 0.062914 -1.355831 0.036236 0.352497 False
C-A -0.383176 0.363404 -1.054407 0.294343 -1.104528 0.338176 0.829463 False
D-A -0.633950 0.346604 -1.829032 0.070499 -1.321954 0.054054 0.352497 False
C-B 0.276622 0.310713 0.890281 0.375541 -0.340138 0.893382 0.829463 False
D-B 0.025847 0.290885 0.088858 0.929380 -0.551555 0.603250 0.929380 False
D-C -0.250774 0.306140 -0.819147 0.414731 -0.858458 0.356910 0.829463 False
Then we choose the subset and method of correction, below I use simes-hochberg like above:
subdf = pw.result_frame.loc[['B-A','C-B','D-C']]
subdf['adj_p'] = multipletests(subdf['P>|t|'].values,method='sh')[1]
subdf
coef std err t P>|t| Conf. Int. Low Conf. Int. Upp. pvalue-sh reject-sh adj_p
B-A -0.659798 0.350649 -1.881645 0.062914 -1.355831 0.036236 0.352497 False 0.188742
C-B 0.276622 0.310713 0.890281 0.375541 -0.340138 0.893382 0.829463 False 0.414731
D-C -0.250774 0.306140 -0.819147 0.414731 -0.858458 0.356910 0.829463 False 0.414731
As a comment, if you see a trend there might be other models to model that, instead of relying on a posthoc test. Also subsetting on the test you need and performing a correction can be argued as some type of cherry picking.. If the number of comparisons (like in your example 6), I suggest you go with the Tukey. This is another discussion you can post on cross-validated.