rggplot2ggpubrrstatix

Adding mean comparisons to plot + Is it possible to display p-values in ggplot (or R in general) from a KS test, specifically on a violin plot?


I'm seeking to create something like this:

Example Output

Using my own data, I would be specifically using the p-values I found here:

KS test p-values

I was able to produce something similar, albeit with the incorrect method. Specifically, I was able to produce something similar using a T-test:

T test p-value

I produced this by writing this code:

l<- ggplot(VioPos, aes(x=Regulation, y=Score,fill=Regulation)) +
  geom_violin(trim=FALSE)+
  labs(title="Plot of ARE Scores by Regulation",x="Gene Regulation", y = "ARE Score")+
  geom_boxplot(width=0.1,fill="white")+
  theme_classic()
l

dp <- l +  scale_y_continuous(trans="log2")
dp



dp7 <- dp +
  stat_compare_means(comparisons=my_comparisons, method="t.test")
dp7

In other words, I utilized stat_compare_means() using ggplot2/tidyverse/ggpubr/rstatix.

However, if I modify the method in the code, it seems to display correctly for Wilcoxon and T tests, but not for ANOVA and Kruskal-Wallis tests. Moreover, it seems that stat_compare_means() only supports those four and not KS, but I'm specifically interested in plotting mean comparisons from my KS test output onto my violin plots. Is there some other package I can use?

Also please note: for the KS test, the "UpScorePos" "DownScorePos" etc. was to compare ARE score by regulation (as I did with the graphs in the T test).


Solution

  • You can get the p-value from a KS-test like this:

    x <- rnorm(100)
    y <- rnorm(100)
    res <- ks.test(x, y)
    res$p.value
    [1] 0.9670685
    

    Just use this p-value and add it to your plots.

    EDIT: A somewhat hacky solution is to use run a t-test and get the right data structure that can be used with stat_pvalalue_manual and insert the pvalues from a ks.test. See the example below (I used the ToothGrowth data as an example).

    # Transform `dose` into factor variable
    df <- ToothGrowth
    df$dose <- as.factor(df$dose)
    
    stat.test <- df %>%
      t_test(len ~ dose)
    stat.test
    
    # prepare test tibble for ks.test
    stat.test <- df %>%
      t_test(len ~ dose)
    stat.test <- stat.test %>% add_y_position()
    stat.test
    
    kst <- stat.test # copy tibble to overwrite p-values for ks.test
    
    p1 <- ks.test(x = ToothGrowth$len[ToothGrowth$dose == 0.5],
                  y = ToothGrowth$len[ToothGrowth$dose == 1]
    )$p
    p2 <- ks.test(x = ToothGrowth$len[ToothGrowth$dose == 0.5],
                  y = ToothGrowth$len[ToothGrowth$dose == 2]
    )$p
    p3 <- ks.test(x = ToothGrowth$len[ToothGrowth$dose == 1],
                  y = ToothGrowth$len[ToothGrowth$dose == 2]
    )$p
    
    kst[, 'p'] <- as.numeric(c(p1, p2, p3))
    
    ggplot(df, aes(x = dose, y = len)) +
      geom_violin(trim = F) +
      stat_pvalue_manual(kst, label = "p = {p}")
    

    violing plot with ks-test p-values