TL;DR I'm struggling to specify a PROC MIXED
SAS model in R using nlme:::lme
.
I am trying to reproduce the SAS code provided in Hamlet et al. (2003) in R code. The SAS code that the authors use makes use of PROC MIXED
:
proc mixed;
class persnum vtype replicate;
model response = vtype / solution ddfm=kr;
random vtype / type=un subject=persnum g gcorr v vcorr;
repeated vtype / type=un subject=replicate(persnum) r rcorr;
run;
I know that solution ddfm=kr
specifies the Kenward-Roger method for computing the denominator degrees of freedom for the fixed effects, which isn't available in R.
Ignoring that, I've tried to reproduce the model using nlme::lme()
(and even geepack::geeglm()
even though I know I'm switching out the random effects). My nlme::lme()
R code is:
model <-
nlme::lme(
fixed = response ~ vtype
,random = list( ID = pdSymm( ~ vtype) )
,correlation = corSymm( form = ~ vtype | persnum/replicate)
,method = "ML"
,data = phPACO_long
)
The model errors:
Error in if (length(uCov) != maxCov) { : missing value where TRUE/FALSE needed
I can't find uCov
or maxCov
in the nlme
git repo so I don't know how to find and fix the error. Does anyone know about this error, where it comes from, and how I might fix it?
I've seen that TueBimet reproduced PROC MIXED
code that had a REPEATED
statement but not a RANDOM
statement, just by using the random
argument in nlme::lme()
. I don't have SAS so I can't check the validity of this approach. Also, the SAS code I'm trying to translate has different groupings for the RANDOM
statement and the REPEATED
statement, so my gut tells me this won't work for me.
The structure of the long-form data is:
structure(list(persnum= structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L), levels = c("1", "2", "3", "4", "5",
"6", "7", "8"), class = "factor"), replicate= structure(c(1L, 1L,
2L, 2L, 3L, 3L, 4L, 4L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 1L,
2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L,
1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L, 6L, 6L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L,
3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L), levels = c("1",
"2", "3", "4", "5", "6", "7", "8", "9"), class = "ordered"), vtype= structure(c(2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), levels = c("pH", "P_aCO2"), class = "factor"), response= c(6.68, 3.97, 6.53, 4.12,
6.43, 4.09, 6.33, 3.97, 6.85, 5.27, 7.06, 5.37, 7.13, 5.41, 7.17,
5.44, 7.4, 5.67, 7.42, 3.64, 7.41, 4.32, 7.37, 4.73, 7.34, 4.96,
7.35, 5.04, 7.28, 5.22, 7.3, 4.82, 7.34, 5.07, 7.36, 5.67, 7.33,
5.1, 7.29, 5.53, 7.3, 4.75, 7.35, 5.51, 7.35, 4.28, 7.3, 4.44,
7.3, 4.32, 7.37, 3.23, 7.27, 4.46, 7.28, 4.72, 7.32, 4.75, 7.32,
4.99, 7.38, 4.78, 7.3, 4.73, 7.29, 5.12, 7.33, 4.93, 7.31, 5.03,
7.33, 4.93, 6.86, 6.85, 6.94, 6.44, 6.92, 6.52, 7.19, 5.28, 7.29,
4.56, 7.21, 4.34, 7.25, 4.32, 7.2, 4.41, 7.19, 3.69, 6.77, 6.09,
6.82, 5.58), persnum.replicate= structure(c(1L, 1L, 9L, 9L, 17L, 17L,
25L, 25L, 2L, 2L, 10L, 10L, 18L, 18L, 26L, 26L, 3L, 3L, 11L,
11L, 19L, 19L, 27L, 27L, 32L, 32L, 37L, 37L, 41L, 41L, 44L, 44L,
47L, 47L, 4L, 4L, 12L, 12L, 20L, 20L, 28L, 28L, 33L, 33L, 5L,
5L, 13L, 13L, 21L, 21L, 29L, 29L, 34L, 34L, 38L, 38L, 42L, 42L,
45L, 45L, 6L, 6L, 14L, 14L, 22L, 22L, 30L, 30L, 35L, 35L, 39L,
39L, 7L, 7L, 15L, 15L, 23L, 23L, 8L, 8L, 16L, 16L, 24L, 24L,
31L, 31L, 36L, 36L, 40L, 40L, 43L, 43L, 46L, 46L), levels = c("1.1",
"2.1", "3.1", "4.1", "5.1", "6.1", "7.1", "8.1", "1.2", "2.2",
"3.2", "4.2", "5.2", "6.2", "7.2", "8.2", "1.3", "2.3", "3.3",
"4.3", "5.3", "6.3", "7.3", "8.3", "1.4", "2.4", "3.4", "4.4",
"5.4", "6.4", "8.4", "3.5", "4.5", "5.5", "6.5", "8.5", "3.6",
"5.6", "6.6", "8.6", "3.7", "5.7", "8.7", "3.8", "5.8", "8.8",
"3.9"), class = c("ordered", "factor"))), row.names = c(NA, -94L
), class = c("tbl_df", "tbl", "data.frame"))
p.s.
The model fits if I exclude the mvar
in the correlation
argument but the resulting covariance matrix does not match Hamlett et al.'s (which makes sense because it assumes a different correlation structure).
This is indeed a bit of a challenge, let's start with the "ground truth" results from SAS (ignoring the Kenward-Roger degrees of freedom & P-values [edit] and using maximum likelihood, see below).
The one thing I did was change the repeated
statement from a variance-covariance parametrization to a variance-correlation one; this has no effect on the fit but more closely reports results in the same scale as nlme::lme
which parametrizes R-side as correlation only:
Solution for Fixed Effects
Effect vtype Estimate Std.Err
Intercept 5.0081686402 0.2436367946
vtype P_aCO2 2.1068507515 0.2529814138
Covariance Parameter Estimates
Cov Parm Subject Estimate
UN(1,1) persnum 0.074648628
UN(2,1) persnum 0.025216431
UN(2,2) persnum 0.425242835
Var(1) replicate(persnum) 0.011559757
Var(2) replicate(persnum) 0.254671475
Corr(2,1) replicate(persnum) -0.509151762
You will need several changes to your nlme::lme
call to get here, but it is possible.
The cause of your error is that lme
can only handle numeric covariates in its random effect specification. You'll have to manually encode dummies for vtype
, though as we'll see next you only need one:
phPACO_long <- within(phPACO_long, {vPH = 0+(vtype=="pH")})
By default SAS does not add an overall intercept to its random/repeated effects, so you get one dummy per level of vtype
. If you try to do this in your current model you'll get an error from lme
because it requires covariates to have a single level within a persnum/replicate
block, which isn't true in your data. But fret not, we can include an overall intercept plus a single dummy and get to the same final fit. This applies to both random
and correlation
.
Second, there's and error in your random
specification: the effect needs to have the name of the block, persnum
. Also, correlation
blocks must be nested within random
blocks, but luckily that is the case here. (They must have the same names to pass this check)
Third, SAS defaults to restricted maximum likelihood, as does lme
. Remove method="ML"
or explicitly set the default method="REML"
. Edit: as it turns out they use ML in the paper (your SAS specification left it at the default REML), so to reproduce the paper exactly you'll need to use ML.
And last, as mentioned lme
only accepts a correlation matrix in correlation
, so you need to allow for heterogeneous residual variance via weights = varIdent()
. This effectively produces an unstructured R-side covariance (or variance-correlation) matrix.
Putting all of that together:
model <- nlme::lme(
fixed = response ~ vtype,
random = list("persnum" = pdSymm(form = ~1+vPH)),
correlation = corSymm(form = ~1+vPH | persnum/replicate),
weights = varIdent(form = ~1 | vtype),
method = "ML",
data = phPACO_long
)
summary(model)
#> Random effects:
#> StdDev
#> (Intercept) 0.2732194
#> vPH 0.6704198
#> Residual 0.1075163
## Note that above is sigma, SAS reports sigma^2
#> Correlation:
#> 1
#> 2 -0.509
#> Variance function:
#> Parameter estimates:
#> P_aCO2 pH
#> 1.000000 4.693707
#> Fixed effects: response ~ vtype
#> Value Std.Error
#> (Intercept) 5.008169 0.2460781
#> vtypeP_aCO2 2.106851 0.2555021
Not everything is reported on the same scale here, but the estimates match very well. For example, SAS UN(1,1) = 0.074648628
is the random intercept SD 0.2732194^2 = 0.07464884
, the R-side Var(1) = 0.011559757
is the residual SD 0.1075163^2 = 0.01155975
, Var(2) = 0.254671475
is the residual SD times its weight (0.1075163 * 4.693707)^2 = 0.2546716
, et cetera. Thanks to the shared variance-correlation parametrization the R off-diagonal also matches (within rounding).
To convert between variance-covariance and variance-correlation parametrization, recall that corr_AB = cov_AB / (sd_A * sd_B)
-- this sure was easier in MathJax. So, to recover the off-diagonal covariance as in the paper you simply calculate (using the SAS
values here) sqrt(0.011559757) * sqrt(0.254671475) * -0.509151762 = -0.0276256
.