rt-test

Optimize data for t.test to avoid "data are essentially constant" error


There are several StackOverflow posts about situation where t.test() in R produce an error saying "data are essentially constant", this is due to that there is not enough difference between the groups (there is no variation) to run the t.test(). (Correct me if there is something else)

I'm in this situation, and I would like to fix this buy altering my data the way the statistical features of the data don't change drastically, so the t-test stays correct. I was wondering what if I add some very little variation to the data (e.g. change 0.301029995663981 to 0.301029995663990), or what else can I do?

For example, this is my data:

# Create the data frame
data <- data.frame(Date = c("2021.08","2021.08","2021.09","2021.09","2021.09","2021.10","2021.10","2021.10","2021.11","2021.11","2021.11","2021.11","2021.11","2021.12","2021.12","2022.01","2022.01","2022.01","2022.01","2022.08","2022.08","2022.08","2022.08","2022.08","2022.09","2022.09","2022.10","2022.10","2022.10","2022.11","2022.11","2022.11","2022.11","2022.11","2022.12","2022.12","2022.12","2022.12","2023.01","2023.01","2023.01","2023.01","2021.08","2021.08","2021.09","2021.09","2021.09","2021.10","2021.10","2021.10","2021.11","2021.11","2021.11","2021.11","2021.11","2021.12","2021.12","2022.01","2022.01","2022.01","2022.01","2022.08","2022.08","2022.08","2022.08","2022.08","2022.09","2022.09","2022.09","2022.09","2022.10","2022.10","2022.10","2022.10","2022.11","2022.11","2022.11","2022.11","2022.11","2022.12","2022.12","2022.12","2022.12","2023.01","2023.01","2023.01","2023.01"),
Species = c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A",
"A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B",
"B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B"),
Site = c("Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something"),
Mean = c("0.301029995663981","1.07918124604762","0.698970004336019","1.23044892137827","1.53147891704226","1.41497334797082","1.7160033436348",
         "0.698970004336019","1.39794000867204","1","0.301029995663981","0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981",
         "0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.845098040014257","0.301029995663981","0.301029995663981",
         "0.477121254719662","0.698970004336019","1.23044892137827","1.41497334797082","1.95904139232109","1.5910646070265","1.53147891704226",
         "1.14612803567824","1.57978359661681","1.34242268082221","0.778151250383644","0.301029995663981","0.301029995663981","0.477121254719662",
         "0.301029995663981","1.20411998265592","0.845098040014257","1.17609125905568","1.20411998265592","0.698970004336019","0.301029995663981",
         "0.698970004336019","0.698970004336019","0.903089986991944","1.14612803567824","0.301029995663981","0.602059991327962","0.301029995663981",
         "0.845098040014257","0.698970004336019","0.698970004336019","0.301029995663981","0.698970004336019","0.301029995663981","0.301029995663981",
         "0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.301029995663981","0.301029995663981","0.301029995663981",
         "0.602059991327962","0.301029995663981","0.845098040014257","1.92941892571429","1.27875360095283","0.698970004336019","1.38021124171161",
         "1.20411998265592","1.38021124171161","1.14612803567824","1","1.07918124604762","1.17609125905568","0.845098040014257","0.698970004336019",
         "0.778151250383644","0.301029995663981","0.845098040014257","1.64345267648619","1.46239799789896","1.34242268082221","1.34242268082221",
         "0.778151250383644"))

After, I set the factors:

# Set factors
str(data)
data$Date<-as.factor(data$Date)
data$Site<-as.factor(data$Site)
data$Species<-as.factor(data$Species)
data$Mean<-as.numeric(data$Mean)
str(data)

When I try t.test():

compare_means(Mean ~ Species, data = data, group.b = "Date", method = "t.test")

This is the error:
Error in `mutate()`:
ℹ In argument: `p = purrr::map(...)`.
Caused by error in `purrr::map()`:
ℹ In index: 5.
ℹ With name: Date.2021.12.
Caused by error in `t.test.default()`:
! data are essentially constant
Run `rlang::last_trace()` to see where the error occurred.

Similarly, when I use this in ggplot:

ggplot(data, aes(x = Date, y = Mean, fill=Species)) +
  geom_boxplot()+
  stat_compare_means(data=data,method="t.test", label = "p.signif") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Warning message:
Computation failed in `stat_compare_means()`
Caused by error in `mutate()`:
ℹ In argument: `p = purrr::map(...)`.
Caused by error in `purrr::map()`:
ℹ In index: 5.
ℹ With name: x.5.
Caused by error in `t.test.default()`:
! data are essentially constant 

What is the best solution, which keeps the data still usable in t-test?


Solution

  • Finding the sd of Mean for each Date-Species combination and then filtering out any Dates where any sd is 0 will do the trick. You could even just pipe the filtered data to compare_means():

    library(dplyr)
    library(ggpubr)
    data <- data.frame(Date = c("2021.08","2021.08","2021.09","2021.09","2021.09","2021.10","2021.10","2021.10","2021.11","2021.11","2021.11","2021.11","2021.11","2021.12","2021.12","2022.01","2022.01","2022.01","2022.01","2022.08","2022.08","2022.08","2022.08","2022.08","2022.09","2022.09","2022.10","2022.10","2022.10","2022.11","2022.11","2022.11","2022.11","2022.11","2022.12","2022.12","2022.12","2022.12","2023.01","2023.01","2023.01","2023.01","2021.08","2021.08","2021.09","2021.09","2021.09","2021.10","2021.10","2021.10","2021.11","2021.11","2021.11","2021.11","2021.11","2021.12","2021.12","2022.01","2022.01","2022.01","2022.01","2022.08","2022.08","2022.08","2022.08","2022.08","2022.09","2022.09","2022.09","2022.09","2022.10","2022.10","2022.10","2022.10","2022.11","2022.11","2022.11","2022.11","2022.11","2022.12","2022.12","2022.12","2022.12","2023.01","2023.01","2023.01","2023.01"),
                       Species = c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A",
                                   "A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B",
                                   "B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B"),
                       Site = c("Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
                                "Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
                                "Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
                                "Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
                                "Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
                                "Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
                                "Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
                                "Something","Something","Something","Something"),
                       Mean = c("0.301029995663981","1.07918124604762","0.698970004336019","1.23044892137827","1.53147891704226","1.41497334797082","1.7160033436348",
                                "0.698970004336019","1.39794000867204","1","0.301029995663981","0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981",
                                "0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.845098040014257","0.301029995663981","0.301029995663981",
                                "0.477121254719662","0.698970004336019","1.23044892137827","1.41497334797082","1.95904139232109","1.5910646070265","1.53147891704226",
                                "1.14612803567824","1.57978359661681","1.34242268082221","0.778151250383644","0.301029995663981","0.301029995663981","0.477121254719662",
                                "0.301029995663981","1.20411998265592","0.845098040014257","1.17609125905568","1.20411998265592","0.698970004336019","0.301029995663981",
                                "0.698970004336019","0.698970004336019","0.903089986991944","1.14612803567824","0.301029995663981","0.602059991327962","0.301029995663981",
                                "0.845098040014257","0.698970004336019","0.698970004336019","0.301029995663981","0.698970004336019","0.301029995663981","0.301029995663981",
                                "0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.301029995663981","0.301029995663981","0.301029995663981",
                                "0.602059991327962","0.301029995663981","0.845098040014257","1.92941892571429","1.27875360095283","0.698970004336019","1.38021124171161",
                                "1.20411998265592","1.38021124171161","1.14612803567824","1","1.07918124604762","1.17609125905568","0.845098040014257","0.698970004336019",
                                "0.778151250383644","0.301029995663981","0.845098040014257","1.64345267648619","1.46239799789896","1.34242268082221","1.34242268082221",
                                "0.778151250383644"))
    
    data$Date<-as.factor(data$Date)
    data$Site<-as.factor(data$Site)
    data$Species<-as.factor(data$Species)
    data$Mean<-as.numeric(data$Mean)
    
    data %>% 
      group_by(Date, Species) %>% 
      mutate(s = sd(Mean)) %>% 
      group_by(Date) %>%
      filter(!any(s == 0)) %>% 
      compare_means(Mean ~ Species, data = ., group.b = "Date", method = "t.test")
    #> # A tibble: 11 × 9
    #>    Date    .y.   group1 group2      p p.adj p.format p.signif method
    #>    <fct>   <chr> <chr>  <chr>   <dbl> <dbl> <chr>    <chr>    <chr> 
    #>  1 2021.08 Mean  A      B      0.718   1    0.718    ns       T-test
    #>  2 2021.09 Mean  A      B      0.451   1    0.451    ns       T-test
    #>  3 2021.10 Mean  A      B      0.0889  0.89 0.089    ns       T-test
    #>  4 2021.11 Mean  A      B      0.850   1    0.850    ns       T-test
    #>  5 2022.01 Mean  A      B      1       1    1.000    ns       T-test
    #>  6 2022.08 Mean  A      B      0.234   1    0.234    ns       T-test
    #>  7 2022.09 Mean  A      B      0.670   1    0.670    ns       T-test
    #>  8 2022.10 Mean  A      B      0.0707  0.78 0.071    ns       T-test
    #>  9 2022.11 Mean  A      B      0.783   1    0.783    ns       T-test
    #> 10 2022.12 Mean  A      B      0.399   1    0.399    ns       T-test
    #> 11 2023.01 Mean  A      B      0.255   1    0.255    ns       T-test
    

    Created on 2023-06-01 with reprex v2.0.2