rlme4anovamanovalimma

Multi-group differential gene expression for time-series treatment data


This is an example dataset:

df = data.frame(genes = c("A", "B", "C", "D", "E"),
                KO_0min_Rep1 = c(0, 1, 2, 6, 6),
                KO_0min_Rep2 = c(0, 3, 2, 3, 6),
                KO_60min_Rep1 = c(0, 0.3, 2, 9.1, 6),
                KO_60min_Rep2 = c(0, 1.3, 2, 6.4, 6),
                KO_120min_Rep1 = c(0, 1, 1, 6, 5),
                KO_120min_Rep2 = c(0, 1, 2.1, 6.8, 5.2),
                WT_0min_Rep1 = c(0, 1, 2, 6, 6),
                WT_0min_Rep2 = c(0, 1, 1.6, 3, 6),
                WT_60min_Rep1 = c(0, 1, 2, 9, 6),
                WT_60min_Rep2 = c(0, 0.3, 2, 6, 2),
                WT_120min_Rep1 = c(0, 1.9, 2, 2, 6),
                WT_120min_Rep2 = c(0, 1.2, 2, 6, 2)  )

The data-frame has several columns, of which the "genes" column has >9000 genes and all other columns are various conditions and treatments. The experimental design is as follows: I have two kinds of cell types: wild type (WT) and knockout (KO). To both of these cell types I treated cells with a DNA damaging agent for 0 minutes, 60 minutes, and 120 minutes. I also have two replicates for these conditions and treatments combinations.

I want to know the genes that are significantly altered after the treatments, but more importantly between the WT and KO conditions over the treatment time points.

This is what I have tried:

lme4_dat = df %>%
  tidyr::gather(conditions, value, -genes) %>%
  dplyr::mutate( group = case_when(grepl("KO", conditions) ~ "KO",
                                   grepl("WT", conditions) ~ "WT")) %>%
  dplyr::mutate( time = case_when(grepl("UT", conditions) ~ "0",
                                  grepl("60", conditions) ~ "60",
                                  grepl("120", conditions) ~ "120" )) %>%
  dplyr::mutate( replicate = case_when(grepl("_Rep1", conditions) ~ "Rep1",
                                       grepl("_Rep2", conditions) ~ "Rep2"))

Then I try to fit a linear mixed-effects model

lme4_model = lme4::lmer(value ~ conditions * time + (1|genes) + (1|replicate), data = lme4_dat)

It's obviously not working. I am not sure if I am doing it correctly? Or is there a better alternative?

Any guidance will be much appreciated. Thank you.


Solution

  • I would suggest repeated measures ANOVA with one within-groups and one between-groups factor.

    lme4_dat$time[is.na(lme4_dat$time)] = 0
    lme4_dat$time = as.factor(lme4_dat$time)
    fit <- aov(value ~ time*group + Error(genes/time), lme4_dat)
    summary(fit)
    library(HH)
    interaction2wt(value ~ time*group, lme4_dat)
    interaction.plot(lme4_dat$time, lme4_dat$group, lme4_dat$value)
    

    Output

    Error: genes
              Df Sum Sq Mean Sq F value Pr(>F)
    Residuals  4  310.4   77.59               
    
    Error: genes:time
              Df Sum Sq Mean Sq F value Pr(>F)
    time       2  2.617   1.309   0.424  0.668
    Residuals  8 24.678   3.085               
    
    Error: Within
               Df Sum Sq Mean Sq F value Pr(>F)
    group       1   2.48  2.4807   1.892  0.176
    time:group  2   0.21  0.1047   0.080  0.923
    Residuals  42  55.07  1.3112   
    

    The group (KO/WT) is borderline significant. The time is less so. The interaction isn't significant with a p.value of .923, which you can also tell by looking at parallel lines in the interaction plot. At time 60 the response increased before dropping at 120.

    enter image description here

    enter image description here