I am quite new to R and is setting up a mixed effect model with lmer. I have a large dataset and want to include 3 random effects that are nested. I have data from 2 years, 2021 and 2022, and there are rounds of experiments every year, that are named experiment 1, 2 and 3 and so on on each year. The problem is that for the random effects, the number of experiment rounds are calculated wrong, as the numbering of the experiments are the same for each year. How can I include "year" for the random effects so the number of groups will be correct?
This is a example of the set up of my df :
example<-data.frame("Site" = c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2),
"Square" = c(1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2,1,1,1,2,2,2),
"Round" = c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3),
"Year" = c(2021,2021,2021,2021,2021,2021,2022,2022,2022,2022,2022,2022,2021,2021,2021,2021,2021,2021,2022,2022,2022,2022,2022,2022),
"Variable" = c(2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,1,1,1,1,1,1),
"Outcome" = c(20,10,39,22,25,29,13,15,12,10,9,15,40,33,42,29,35,50,25,23,22,26,24,23))
Here is my code where I want to investigate Outcome predicted by Variable (fixed effect) with Site , Square and Round as nested random effects:
model8<-lmer(Outcome ~ Variable + (1|Site/Square/Round),
data=example)
summary(model8)
And here is the output:
boundary (singular) fit: see help('isSingular')
Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: Outcome ~ Variable + (1 | Site/Square/Round)
Data: example
REML criterion at convergence: 150.7
Scaled residuals:
Min 1Q Median 3Q Max
-2.44099 -0.39914 -0.02029 0.45294 2.26577
Random effects:
Groups Name Variance Std.Dev.
Round:(Square:Site) (Intercept) 0.00 0.000
Square:Site (Intercept) 0.00 0.000
Site (Intercept) 78.12 8.838
Residual 37.96 6.161
Number of obs: 24, groups: Round:(Square:Site), 12; Square:Site, 4; Site, 2
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 5.000 7.408 1.813 0.675 0.575
Variable 13.083 2.515 21.000 5.201 3.73e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
Variable -0.509
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
Please do not pay attention to the singularity error, this is just an example. Look at the number of observations:
Number of obs: 24, groups: Round:(Square:Site), 12; Square:Site, 4; Site, 2
Here the number of rounds is incorrect, it should be 24 (same as number of observations). How should I specify year for the random effect Round? Is it at all possible or I am on the wrong track? The Sites and Squares are the same year after year, but the rounds are unique for each year/the dataset. Or is it just easier to change the Rounds to 1, 2, 3 at 2021 and then 4,5,6 on year 2022 in the actual dataframe? I am just in the process of setting up my model, and I have started with the random effects.
If you want the values of Round
to be disambiguated by year you can do this:
lmer(Outcome ~ Variable + (1|Site/Square) +
+ (1|Year:Site:Square:Round), data = example)
However, this will trigger the following error:
Error: number of levels of each grouping factor must be < number of observations (problems: Year:Site:Square:Round)
which occurs because, for any Year/Site/Square/Round combination, there is only one observation in the data set. This means that the variance associated with that random effect is confounded with the residual variance (put another way, these two variances are jointly unidentifiable). (There is another explanation of this here.)
If your real data set (you said you had a "large dataset") has more than one observation per Year/Site/Square/Round combination, then this won't be an issue.
I would also say that treating Site
as a random effect if there are only two levels is unlikely to work well. (Again, maybe you have more sites in your real data set.)