I would like some help to use R to analyze an experiment in which three groups of participants were shown two types of stimulus three times each. The dependent variable is a continuous measure. Here is an example of how the dataset looks like.
SubjectID Group Trial StimType Measure
1 1 group1 trial3_stimA A 0.55908866
2 2 group1 trial3_stimA A 0.98884446
3 3 group2 trial3_stimA A 0.00000000
4 4 group2 trial3_stimA A 0.27067991
5 5 group3 trial3_stimA A 0.37169285
6 6 group3 trial3_stimA A 0.42113984
7 1 group1 trial3_stimB B 0.00000000
8 2 group1 trial3_stimB B 0.49892807
9 3 group2 trial3_stimB B 0.14602589
10 4 group2 trial3_stimB B 0.50946555
11 5 group3 trial3_stimB B 0.25572820
12 6 group3 trial3_stimB B 0.22932966
13 1 group1 trial1_stimA A 0.42207604
14 2 group1 trial1_stimA A 0.85599588
15 3 group2 trial1_stimA A 0.36428381
16 4 group2 trial1_stimA A 0.46679336
17 5 group3 trial1_stimA A 0.69379734
18 6 group3 trial1_stimA A 0.55607716
19 1 group1 trial1_stimB B 0.24261465
20 2 group1 trial1_stimB B 0.35176384
21 3 group2 trial1_stimB B 0.21116215
22 4 group2 trial1_stimB B 0.33112544
23 5 group3 trial1_stimB B 0.00000000
24 6 group3 trial1_stimB B 0.00000000
25 1 group1 trial2_stimA A 0.05506943
26 2 group1 trial2_stimA A 0.22537470
27 3 group2 trial2_stimA A 0.00000000
28 4 group2 trial2_stimA A 0.18511144
29 5 group3 trial2_stimA A 0.15586156
30 6 group3 trial2_stimA A 0.04467100
31 1 group1 trial2_stimB B 0.03890585
32 2 group1 trial2_stimB B 0.29787709
33 3 group2 trial2_stimB B 0.00000000
34 4 group2 trial2_stimB B 0.28971992
35 5 group3 trial2_stimB B 0.12993238
36 6 group3 trial2_stimB B 0.05066011
Here is the structure of my data
'data.frame': 36 obs. of 5 variables:
$ SubjectID: Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6 1 2 3 4 ...
$ Group : Factor w/ 3 levels "group1","group2",..: 1 1 2 2 3 3 1 1 2 2 ...
$ Trial : Factor w/ 6 levels "trial1_stimA",..: 5 5 5 5 5 5 6 6 6 6 ...
$ StimType : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2 2 ...
$ Measure : num 0.559 0.989 0 0.271 0.372 ...
I need to run a mixed ANOVA with groups as between subjects factor and trials and stimulus type as within subjects factor. I have tried using three different R packages and different syntaxes but R either returns an error message or the output is missing the interactions group x trial x stim type.
For example, when I use anova_test() from the rstatix package
#Trying mixed ANOVA with rstatix
>
> mixed.anova <- anova_test(
+ data = prepared_data, dv = Measure, wid = SubjectID,
+ between = Group, within = c(Trial,StimType)
+ )
Error in check.imatrix(X.design) :
Terms in the intra-subject model matrix are not orthogonal.
> get_anova_table(all_subjects)
Error in is.data.frame(x) : object 'all_subjects' not found
When I use the the aov from afex package
> #using afex
>
>
> mixed.anova2 <- aov_car(Measure ~ Group*Trial*StimType + Error(1|SubjectID/(Trial*StimType)), prepared_data)
Error: Empty cells in within-subjects design (i.e., bad data structure).
table(data[c("Trial", "StimType")])
# StimType
# Trial A B
# trial1_stimA 6 0
# trial1_stimB 0 6
# trial2_stimA 6 0
# trial2_stimB 0 6
# trial3_stimA 6 0
# trial3_stimB 0 6
>
> aov.bww
Call:
aov(formula = SCR ~ Group * Trial * CSType + Error(SubjectID) +
Group, data = sixPhasesAbs2)
Grand Mean: 397.1325
Stratum 1: SubjectID
Terms:
Group Residuals
Sum of Squares 187283464 6399838881
Deg. of Freedom 2 81
Residual standard error: 8888.777
18 out of 20 effects not estimable
Estimated effects may be unbalanced
Stratum 2: Within
Terms:
Trial Group:Trial Residuals
Sum of Squares 158766098 374583941 12799639740
Deg. of Freedom 5 10 405
Residual standard error: 5621.748
12 out of 27 effects not estimable
Estimated effects may be unbalanced
Finally, I tried using lmer
> #using lmer
> model = lmer(Measure ~Group*Trial*StimType +(1|SubjectID), data = prepared_data)
fixed-effect model matrix is rank deficient so dropping 18 columns / coefficients
>
> summary(model)
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: Measure ~ Group * Trial * StimType + (1 | SubjectID)
Data: prepared_data
REML criterion at convergence: -16.8
Scaled residuals:
Min 1Q Median 3Q Max
-1.1854 -0.5424 0.0000 0.5424 1.1854
Random effects:
Groups Name Variance Std.Dev.
SubjectID (Intercept) 0.024226 0.15565
Residual 0.006861 0.08283
Number of obs: 36, groups: SubjectID, 6
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.639036 0.124672 4.459241 5.126 0.005095 **
Groupgroup2 -0.223497 0.176313 4.459241 -1.268 0.267088
Groupgroup3 -0.014099 0.176313 4.459241 -0.080 0.939728
Trialtrial1_stimB -0.341847 0.082829 15.000000 -4.127 0.000896 ***
Trialtrial2_stimA -0.498814 0.082829 15.000000 -6.022 2.34e-05 ***
Trialtrial2_stimB -0.470644 0.082829 15.000000 -5.682 4.35e-05 ***
Trialtrial3_stimA 0.134931 0.082829 15.000000 1.629 0.124124
Trialtrial3_stimB -0.389572 0.082829 15.000000 -4.703 0.000283 ***
Groupgroup2:Trialtrial1_stimB 0.197452 0.117138 15.000000 1.686 0.112550
Groupgroup3:Trialtrial1_stimB -0.283091 0.117138 15.000000 -2.417 0.028865 *
Groupgroup2:Trialtrial2_stimA 0.175831 0.117138 15.000000 1.501 0.154095
Groupgroup3:Trialtrial2_stimA -0.025857 0.117138 15.000000 -0.221 0.828271
Groupgroup2:Trialtrial2_stimB 0.199966 0.117138 15.000000 1.707 0.108413
Groupgroup3:Trialtrial2_stimB -0.063997 0.117138 15.000000 -0.546 0.592870
Groupgroup2:Trialtrial3_stimA -0.415129 0.117138 15.000000 -3.544 0.002946 **
Groupgroup3:Trialtrial3_stimA -0.363452 0.117138 15.000000 -3.103 0.007276 **
Groupgroup2:Trialtrial3_stimB 0.301779 0.117138 15.000000 2.576 0.021070 *
Groupgroup3:Trialtrial3_stimB 0.007164 0.117138 15.000000 0.061 0.952043
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation matrix not shown by default, as p = 18 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
fit warnings:
fixed-effect model matrix is rank deficient so dropping 18 columns / coefficients
>
> anova(model)
Missing cells for: Trialtrial1_stimB:StimTypeA, Trialtrial2_stimB:StimTypeA, Trialtrial3_stimB:StimTypeA, Trialtrial1_stimA:StimTypeB, Trialtrial2_stimA:StimTypeB, Trialtrial3_stimA:StimTypeB, Groupgroup1:Trialtrial1_stimB:StimTypeA, Groupgroup2:Trialtrial1_stimB:StimTypeA, Groupgroup3:Trialtrial1_stimB:StimTypeA, Groupgroup1:Trialtrial2_stimB:StimTypeA, Groupgroup2:Trialtrial2_stimB:StimTypeA, Groupgroup3:Trialtrial2_stimB:StimTypeA, Groupgroup1:Trialtrial3_stimB:StimTypeA, Groupgroup2:Trialtrial3_stimB:StimTypeA, Groupgroup3:Trialtrial3_stimB:StimTypeA, Groupgroup1:Trialtrial1_stimA:StimTypeB, Groupgroup2:Trialtrial1_stimA:StimTypeB, Groupgroup3:Trialtrial1_stimA:StimTypeB, Groupgroup1:Trialtrial2_stimA:StimTypeB, Groupgroup2:Trialtrial2_stimA:StimTypeB, Groupgroup3:Trialtrial2_stimA:StimTypeB, Groupgroup1:Trialtrial3_stimA:StimTypeB, Groupgroup2:Trialtrial3_stimA:StimTypeB, Groupgroup3:Trialtrial3_stimA:StimTypeB.
Interpret type III hypotheses with care.
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Group 0.00723 0.003614 2 3 0.5267 0.6367151
Trial 0.96171 0.192343 5 15 28.0356 4.172e-07 ***
Group:Trial 0.44102 0.044102 10 15 6.4283 0.0007411 ***
StimType
Group:StimType
Trial:StimType
Group:Trial:StimType
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I have searched for similar questions but I have not found any answers that would help me with this problem. I suspect (given the error messages) that I might have to change the structure of the dataset, but I do not know how to do this as I am a beginner in R. How can I run a mixed ANOVA in this dataset?
Since Trial
contains information about both Trial
and StimType
, the fixed effect model matrix is rank deficient. One can see this from the afex::aov_car()
output in the original post that illustrates empty cells that are the effect of Trial
containing information from two distinct variables.
> mixed.anova2 <- aov_car(Measure ~ Group*Trial*StimType + Error(1|SubjectID/(Trial*StimType)), prepared_data)
Error: Empty cells in within-subjects design (i.e., bad data structure).
table(data[c("Trial", "StimType")])
# StimType
# Trial A B
# trial1_stimA 6 0
# trial1_stimB 0 6
# trial2_stimA 6 0
# trial2_stimB 0 6
# trial3_stimA 6 0
# trial3_stimB 0 6
>
> aov.bww
We can correct the rank deficiency by splitting the Trial
variable into two variables.
With the lme4
package and tidyr::separate()
to separate trial from stim value, the analysis looks like this:
textFile <- "rowId SubjectID Group Trial StimType Measure
1 1 group1 trial3_stimA A 0.55908866
2 2 group1 trial3_stimA A 0.98884446
3 3 group2 trial3_stimA A 0.00000000
4 4 group2 trial3_stimA A 0.27067991
5 5 group3 trial3_stimA A 0.37169285
6 6 group3 trial3_stimA A 0.42113984
7 1 group1 trial3_stimB B 0.00000000
8 2 group1 trial3_stimB B 0.49892807
9 3 group2 trial3_stimB B 0.14602589
10 4 group2 trial3_stimB B 0.50946555
11 5 group3 trial3_stimB B 0.25572820
12 6 group3 trial3_stimB B 0.22932966
13 1 group1 trial1_stimA A 0.42207604
14 2 group1 trial1_stimA A 0.85599588
15 3 group2 trial1_stimA A 0.36428381
16 4 group2 trial1_stimA A 0.46679336
17 5 group3 trial1_stimA A 0.69379734
18 6 group3 trial1_stimA A 0.55607716
19 1 group1 trial1_stimB B 0.24261465
20 2 group1 trial1_stimB B 0.35176384
21 3 group2 trial1_stimB B 0.21116215
22 4 group2 trial1_stimB B 0.33112544
23 5 group3 trial1_stimB B 0.00000000
24 6 group3 trial1_stimB B 0.00000000
25 1 group1 trial2_stimA A 0.05506943
26 2 group1 trial2_stimA A 0.22537470
27 3 group2 trial2_stimA A 0.00000000
28 4 group2 trial2_stimA A 0.18511144
29 5 group3 trial2_stimA A 0.15586156
30 6 group3 trial2_stimA A 0.04467100
31 1 group1 trial2_stimB B 0.03890585
32 2 group1 trial2_stimB B 0.29787709
33 3 group2 trial2_stimB B 0.00000000
34 4 group2 trial2_stimB B 0.28971992
35 5 group3 trial2_stimB B 0.12993238
36 6 group3 trial2_stimB B 0.05066011
"
data <- read.table(text = textFile,header = TRUE)
library(dplyr)
library(tidyr)
# split trial from Stim
data %>% separate(Trial,into = c("trialName","stimName")) -> data
library(lme4)
model = lmer(Measure ~Group*trialName*StimType +(1|SubjectID), data = data)
summary(model)
...and the output:
> summary(model)
Linear mixed model fit by REML ['lmerMod']
Formula: Measure ~ Group * trialName * StimType + (1 | SubjectID)
Data: data
REML criterion at convergence: -16.8
Scaled residuals:
Min 1Q Median 3Q Max
-1.1854 -0.5424 0.0000 0.5424 1.1854
Random effects:
Groups Name Variance Std.Dev.
SubjectID (Intercept) 0.024226 0.15565
Residual 0.006861 0.08283
Number of obs: 36, groups: SubjectID, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.63904 0.12467 5.126
Groupgroup2 -0.22350 0.17631 -1.268
Groupgroup3 -0.01410 0.17631 -0.080
trialNametrial2 -0.49881 0.08283 -6.022
trialNametrial3 0.13493 0.08283 1.629
StimTypeB -0.34185 0.08283 -4.127
Groupgroup2:trialNametrial2 0.17583 0.11714 1.501
Groupgroup3:trialNametrial2 -0.02586 0.11714 -0.221
Groupgroup2:trialNametrial3 -0.41513 0.11714 -3.544
Groupgroup3:trialNametrial3 -0.36345 0.11714 -3.103
Groupgroup2:StimTypeB 0.19745 0.11714 1.686
Groupgroup3:StimTypeB -0.28309 0.11714 -2.417
trialNametrial2:StimTypeB 0.37002 0.11714 3.159
trialNametrial3:StimTypeB -0.18266 0.11714 -1.559
Groupgroup2:trialNametrial2:StimTypeB -0.17332 0.16566 -1.046
Groupgroup3:trialNametrial2:StimTypeB 0.24495 0.16566 1.479
Groupgroup2:trialNametrial3:StimTypeB 0.51946 0.16566 3.136
Groupgroup3:trialNametrial3:StimTypeB 0.65371 0.16566 3.946
Correlation matrix not shown by default, as p = 18 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
>
In the comments to my answer, the questioner explained that the data used to describe trials and stim types were different than what was represented in the data posted with the question. Given the contents of the comment, here is an approach with dplyr
to parse out the trial conditions and stim types.
# solving the data cleaning problem in comments
df <- data.frame(treatmentName = c("MeanCondPlusLate",
"MeanCondMinusLate",
"MeanExtPlusLate",
"MeanExtMinusLate",
"TestPlus1",
"TestMinus1"))
library(dplyr)
df %>% mutate(trial = case_when(grepl("Cond",treatmentName) == TRUE ~ 1,
grepl("Ext",treatmentName) == TRUE ~ 2,
grepl("Test",treatmentName) == TRUE ~ 3),
stimLevel = case_when(grepl("Plus",treatmentName) == TRUE ~ "A",
grepl("Minus",treatmentName) == TRUE ~ "B"))
...and the output:
treatmentName trial stimLevel
1 MeanCondPlusLate 1 A
2 MeanCondMinusLate 1 B
3 MeanExtPlusLate 2 A
4 MeanExtMinusLate 2 B
5 TestPlus1 3 A
6 TestMinus1 3 B
>