I found that function varIdent
has an undesired behavior which is dependent on data order. In detail, the reference category depends on data order, see the next example
set.seed(123)
library(tidyverse)
library(nlme)
#>
#> Attaching package: 'nlme'
#> The following object is masked from 'package:dplyr':
#>
#> collapse
n_smpl<-10
z0<-data.frame(sex=sample(c("F","M"),n_smpl,prob = c(0.6,0.4),replace = T) %>% factor)
summary(z0)
#> sex
#> F:6
#> M:4
X<-model.matrix(~sex, z0)
sd_e<-1.3
e<-rnorm(nrow(z0),0,sd_e)
my_bts<-c(10,1.4)
z0$y<-(X %*% my_bts+e)%>% drop
m0<-gls(model=y~sex,weights=varIdent(form=~1|sex),data=z0[order(z0$sex),])
m1<-gls(model=y~sex,weights=varIdent(form=~1|sex),data=z0[order(z0$sex,decreasing =T),])
summary(m0)
#> Generalized least squares fit by REML
#> Model: y ~ sex
#> Data: z0[order(z0$sex), ]
#> AIC BIC logLik
#> 35.82722 36.14498 -13.91361
#>
#> Variance function:
#> Structure: Different standard deviations per stratum
#> Formula: ~1 | sex
#> Parameter estimates:
#> F M
#> 1.0000000 0.5305269
#>
#> Coefficients:
#> Value Std.Error t-value p-value
#> (Intercept) 10.344227 0.5847687 17.689434 0.0000
#> sexM 0.967754 0.6973690 1.387721 0.2026
#>
#> Correlation:
#> (Intr)
#> sexM -0.839
#>
#> Standardized residuals:
#> Min Q1 Med Q3 Max
#> -1.38845834 -0.72023259 -0.02681155 0.85333098 1.31623645
#>
#> Residual standard error: 1.432385
#> Degrees of freedom: 10 total; 8 residual
summary(m1)
#> Generalized least squares fit by REML
#> Model: y ~ sex
#> Data: z0[order(z0$sex, decreasing = T), ]
#> AIC BIC logLik
#> 35.82722 36.14498 -13.91361
#>
#> Variance function:
#> Structure: Different standard deviations per stratum
#> Formula: ~1 | sex
#> Parameter estimates:
#> M F
#> 1.000000 1.884919
#>
#> Coefficients:
#> Value Std.Error t-value p-value
#> (Intercept) 10.344227 0.5847687 17.689434 0.0000
#> sexM 0.967754 0.6973690 1.387721 0.2026
#>
#> Correlation:
#> (Intr)
#> sexM -0.839
#>
#> Standardized residuals:
#> Min Q1 Med Q3 Max
#> -1.38845834 -0.72023259 -0.02681155 0.85333098 1.31623645
#>
#> Residual standard error: 0.7599187
#> Degrees of freedom: 10 total; 8 residual
Created on 2023-04-04 by the reprex package (v2.0.1)
Look at Variance function
outputs for m0
and m1
models, the reference category is different in each model. This does not happens in the mean structure component of the model.
This seems to be a bug, or at least an infelicity, in nlme
. If your sex
variable wasn't already a factor with defined levels I would say it was your fault, or at least expected - but it is.
I have reported this here.
A patched version that fixes this problem is available at https://github.com/bbolker/nlme: remotes::install_github("bbolker/nlme")
to install it (you'll need to have development tools installed to install from source).
A slightly more compact reproducible example:
library(nlme)
dd <- transform(mtcars,
am = factor(am))
dd <- dd[order(dd$am),]
dd_rev <- dd[order(dd$am, decreasing = TRUE),]
m0 <- gls(mpg ~ hp, weights = varIdent(form = ~1|am), data = dd)
m1 <- update(m0, data = dd_rev)
coef(m0$modelStruct$varStruct) # 0.9776946
coef(m1$modelStruct$varStruct) # -0.9776946