rviolin-plot

Half violin plot with ignoring some factors in R


I have a very similar question to the one asked in Half violin plot with different factors in R which was perfectly answered by @AllanCameron. In addition to the data of the mentioned question I also have different grades:

student_id  group       grade    test_id  Score
145         Treatment   2        pre      0.12
145         Treatment   3        post     0.78
109         Treatment   5        pre      0.45
109         Treatment   5        post     0.99
195         Treatment   4        pre      0.22
195         Treatment   4        post     0.75
119         Treatment   6        pre      0.15
119         Treatment   6        post     0.59

I would like to do a half-violin plot where one factor is the pre- and posttest and the two halfs constitute of the 3rd/4th grade for the posttest and the 5th/6th grade for the pretest:

violin

I've played around with the code provided in the previous answer, but the only thing I came up with is plotting the two grades separately as factors and then cutting and pasting them together. Not very elegant! I hope someone has a better way of achieving this.

Here is a MWE:

set.seed(1)
data <- data.frame(
                 group = rep(sample(c('Treatment', 'Control'), 50, TRUE), 
                             each = 2),
                 test_id = rep(c('pre', 'post'), 50),
                 grade = sample(3:6, 100, replace = TRUE),
                 Score = runif(100)
                 )

library(ggplot2)
library(see)

ggplot(data, aes(test_id, Score, fill = grade)) +
  geom_boxplot(width = 0.1, position = position_dodge(0.2)) +
  geom_violinhalf(aes(group = interaction(test_id, grade)), fill = 'gray',
                  trim = FALSE, flip = c(1, 2)) +
  theme_classic(16)

This produces the undesired plot

violin_wrong


Solution

  • Not 100% sure whether you want separate boxplots for the groups or just one boxplot but to fix the issue with the violin plots you can filter the data used for geom_violinhalf, i.e. to include only grades 3 and 4 for the pre-test data and 5 and 6 for the post-test data. Additionally, as you now have four groups you have to set flip=c(1, 3) to flip the left-hand violins.

    Note: For the reprex I mapped "grade" on fill in geom_violinhalf to check and show that it displays the right grades.

    library(ggplot2)
    library(see)
    
    ggplot(data, aes(test_id, Score, fill = group)) +
      geom_boxplot(width = 0.1, position = position_dodge(0.2)) +
      geom_violinhalf(
        data = ~ subset(
          .x,
          (test_id %in% "pre" & grade %in% c(3, 4)) |
            (test_id %in% "post" & grade %in% c(5, 6))
        ),
        aes(group = interaction(test_id, grade), fill = factor(grade)),
        #fill = "gray",
        trim = FALSE, 
        flip = c(1, 3)
      ) +
      scale_x_discrete(limits = c("pre", "post")) +
      theme_classic()