There are several StackOverflow posts about situation where t.test() in R produce an error saying "data are essentially constant", this is due to that there is not enough difference between the groups (there is no variation) to run the t.test(). (Correct me if there is something else)
I'm in this situation, and I would like to fix this buy altering my data the way the statistical features of the data don't change drastically, so the t-test stays correct. I was wondering what if I add some very little variation to the data (e.g. change 0.301029995663981 to 0.301029995663990), or what else can I do?
For example, this is my data:
# Create the data frame
data <- data.frame(Date = c("2021.08","2021.08","2021.09","2021.09","2021.09","2021.10","2021.10","2021.10","2021.11","2021.11","2021.11","2021.11","2021.11","2021.12","2021.12","2022.01","2022.01","2022.01","2022.01","2022.08","2022.08","2022.08","2022.08","2022.08","2022.09","2022.09","2022.10","2022.10","2022.10","2022.11","2022.11","2022.11","2022.11","2022.11","2022.12","2022.12","2022.12","2022.12","2023.01","2023.01","2023.01","2023.01","2021.08","2021.08","2021.09","2021.09","2021.09","2021.10","2021.10","2021.10","2021.11","2021.11","2021.11","2021.11","2021.11","2021.12","2021.12","2022.01","2022.01","2022.01","2022.01","2022.08","2022.08","2022.08","2022.08","2022.08","2022.09","2022.09","2022.09","2022.09","2022.10","2022.10","2022.10","2022.10","2022.11","2022.11","2022.11","2022.11","2022.11","2022.12","2022.12","2022.12","2022.12","2023.01","2023.01","2023.01","2023.01"),
Species = c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A",
"A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B",
"B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B"),
Site = c("Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something"),
Mean = c("0.301029995663981","1.07918124604762","0.698970004336019","1.23044892137827","1.53147891704226","1.41497334797082","1.7160033436348",
"0.698970004336019","1.39794000867204","1","0.301029995663981","0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981",
"0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.845098040014257","0.301029995663981","0.301029995663981",
"0.477121254719662","0.698970004336019","1.23044892137827","1.41497334797082","1.95904139232109","1.5910646070265","1.53147891704226",
"1.14612803567824","1.57978359661681","1.34242268082221","0.778151250383644","0.301029995663981","0.301029995663981","0.477121254719662",
"0.301029995663981","1.20411998265592","0.845098040014257","1.17609125905568","1.20411998265592","0.698970004336019","0.301029995663981",
"0.698970004336019","0.698970004336019","0.903089986991944","1.14612803567824","0.301029995663981","0.602059991327962","0.301029995663981",
"0.845098040014257","0.698970004336019","0.698970004336019","0.301029995663981","0.698970004336019","0.301029995663981","0.301029995663981",
"0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.301029995663981","0.301029995663981","0.301029995663981",
"0.602059991327962","0.301029995663981","0.845098040014257","1.92941892571429","1.27875360095283","0.698970004336019","1.38021124171161",
"1.20411998265592","1.38021124171161","1.14612803567824","1","1.07918124604762","1.17609125905568","0.845098040014257","0.698970004336019",
"0.778151250383644","0.301029995663981","0.845098040014257","1.64345267648619","1.46239799789896","1.34242268082221","1.34242268082221",
"0.778151250383644"))
After, I set the factors:
# Set factors
str(data)
data$Date<-as.factor(data$Date)
data$Site<-as.factor(data$Site)
data$Species<-as.factor(data$Species)
data$Mean<-as.numeric(data$Mean)
str(data)
When I try t.test():
compare_means(Mean ~ Species, data = data, group.b = "Date", method = "t.test")
This is the error:
Error in `mutate()`:
ℹ In argument: `p = purrr::map(...)`.
Caused by error in `purrr::map()`:
ℹ In index: 5.
ℹ With name: Date.2021.12.
Caused by error in `t.test.default()`:
! data are essentially constant
Run `rlang::last_trace()` to see where the error occurred.
Similarly, when I use this in ggplot:
ggplot(data, aes(x = Date, y = Mean, fill=Species)) +
geom_boxplot()+
stat_compare_means(data=data,method="t.test", label = "p.signif") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Warning message:
Computation failed in `stat_compare_means()`
Caused by error in `mutate()`:
ℹ In argument: `p = purrr::map(...)`.
Caused by error in `purrr::map()`:
ℹ In index: 5.
ℹ With name: x.5.
Caused by error in `t.test.default()`:
! data are essentially constant
What is the best solution, which keeps the data still usable in t-test?
Finding the sd of Mean
for each Date-Species combination and then filtering out any Dates where any sd is 0 will do the trick. You could even just pipe the filtered data to compare_means()
:
library(dplyr)
library(ggpubr)
data <- data.frame(Date = c("2021.08","2021.08","2021.09","2021.09","2021.09","2021.10","2021.10","2021.10","2021.11","2021.11","2021.11","2021.11","2021.11","2021.12","2021.12","2022.01","2022.01","2022.01","2022.01","2022.08","2022.08","2022.08","2022.08","2022.08","2022.09","2022.09","2022.10","2022.10","2022.10","2022.11","2022.11","2022.11","2022.11","2022.11","2022.12","2022.12","2022.12","2022.12","2023.01","2023.01","2023.01","2023.01","2021.08","2021.08","2021.09","2021.09","2021.09","2021.10","2021.10","2021.10","2021.11","2021.11","2021.11","2021.11","2021.11","2021.12","2021.12","2022.01","2022.01","2022.01","2022.01","2022.08","2022.08","2022.08","2022.08","2022.08","2022.09","2022.09","2022.09","2022.09","2022.10","2022.10","2022.10","2022.10","2022.11","2022.11","2022.11","2022.11","2022.11","2022.12","2022.12","2022.12","2022.12","2023.01","2023.01","2023.01","2023.01"),
Species = c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A",
"A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B",
"B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B"),
Site = c("Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something"),
Mean = c("0.301029995663981","1.07918124604762","0.698970004336019","1.23044892137827","1.53147891704226","1.41497334797082","1.7160033436348",
"0.698970004336019","1.39794000867204","1","0.301029995663981","0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981",
"0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.845098040014257","0.301029995663981","0.301029995663981",
"0.477121254719662","0.698970004336019","1.23044892137827","1.41497334797082","1.95904139232109","1.5910646070265","1.53147891704226",
"1.14612803567824","1.57978359661681","1.34242268082221","0.778151250383644","0.301029995663981","0.301029995663981","0.477121254719662",
"0.301029995663981","1.20411998265592","0.845098040014257","1.17609125905568","1.20411998265592","0.698970004336019","0.301029995663981",
"0.698970004336019","0.698970004336019","0.903089986991944","1.14612803567824","0.301029995663981","0.602059991327962","0.301029995663981",
"0.845098040014257","0.698970004336019","0.698970004336019","0.301029995663981","0.698970004336019","0.301029995663981","0.301029995663981",
"0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.301029995663981","0.301029995663981","0.301029995663981",
"0.602059991327962","0.301029995663981","0.845098040014257","1.92941892571429","1.27875360095283","0.698970004336019","1.38021124171161",
"1.20411998265592","1.38021124171161","1.14612803567824","1","1.07918124604762","1.17609125905568","0.845098040014257","0.698970004336019",
"0.778151250383644","0.301029995663981","0.845098040014257","1.64345267648619","1.46239799789896","1.34242268082221","1.34242268082221",
"0.778151250383644"))
data$Date<-as.factor(data$Date)
data$Site<-as.factor(data$Site)
data$Species<-as.factor(data$Species)
data$Mean<-as.numeric(data$Mean)
data %>%
group_by(Date, Species) %>%
mutate(s = sd(Mean)) %>%
group_by(Date) %>%
filter(!any(s == 0)) %>%
compare_means(Mean ~ Species, data = ., group.b = "Date", method = "t.test")
#> # A tibble: 11 × 9
#> Date .y. group1 group2 p p.adj p.format p.signif method
#> <fct> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
#> 1 2021.08 Mean A B 0.718 1 0.718 ns T-test
#> 2 2021.09 Mean A B 0.451 1 0.451 ns T-test
#> 3 2021.10 Mean A B 0.0889 0.89 0.089 ns T-test
#> 4 2021.11 Mean A B 0.850 1 0.850 ns T-test
#> 5 2022.01 Mean A B 1 1 1.000 ns T-test
#> 6 2022.08 Mean A B 0.234 1 0.234 ns T-test
#> 7 2022.09 Mean A B 0.670 1 0.670 ns T-test
#> 8 2022.10 Mean A B 0.0707 0.78 0.071 ns T-test
#> 9 2022.11 Mean A B 0.783 1 0.783 ns T-test
#> 10 2022.12 Mean A B 0.399 1 0.399 ns T-test
#> 11 2023.01 Mean A B 0.255 1 0.255 ns T-test
Created on 2023-06-01 with reprex v2.0.2