After running a statistical model in R (e.g., glm
, lm
, lme4::lmer
, etc.), I run the summary()
command with corr=TRUE
to get the Correlation of Coefficients table. It features a matrix of correlations for the intercept and the independent variables in the given model. What I would like to do is extract the columns that make those correlations possible. For instance, if the model is x ~ a + b
, and the table looks like this...
(Intercept) a
a 0.##
b 0.## 0.##
I want to extract the columns that allow for the correlation between a
and b
.
I have found various commands to extract fitted values and residuals, but nothing to yet that would give me values for a
and b
that correspond to their correlations in that table.
What you want is the summary(.)$correlation
.
If you want to extract something from an object
, look into the structure str(object)
to find the desired element. (If you are working in RStudio, perhaps start with str(object, max.levels=1)
as it tends to hang up or crash if the output is too large.)
Looking into the source code of fitted()
methods such as stats:::fitted.lm
reveals, that they essentially extract the $fitted.values
element of the respective object (similar applies to resid()
).
Same way we could write a function ex_corr_lm()
. To obtain the RHS variables automatically we may use all.vars()
on the formula element.
The object
also contains the model
data which is the part of data from the original data used to fit the model. If there were NA
s, they might be removed. We can use it to reconstruct the columns that "make those correlations possible".
> ex_corr_lm <- \(object, data=FALSE, vars) {
+ if (missing(vars)) {
+ vars <- all.vars(s$call$formula)[-1]
+ }
+ if (inherits(object, c("lm", "glm"))) {
+ s <- summary(object, corr=TRUE)
+ corr <- s$correlation[vars, vars]
+ corr_data <- object$model[, vars]
+ } else if (inherits(object, "lmerMod")) {
+ s <- summary(object, corr=TRUE)
+ corr <- as.matrix(s$vcov@factors$correlation)[vars, vars]
+ corr_data <- object@frame[, vars]
+ } else {
+ stop('not implemented.')
+ }
+ if (data) {
+ list(corr=corr, corr_data=corr_data)
+ } else {
+ corr
+ }
+ }
> ex_corr_lm(f_lm)
a b
a 1.0000000 -0.4798395
b -0.4798395 1.0000000
>
> ex_corr_lm(f_lm, vars=c('a', 'b')) ## explicitly specify vars
a b
a 1.0000000 -0.4798395
b -0.4798395 1.0000000
>
> ex_corr_lm(f_glm)
a b
a 1.0000000 -0.4798395
b -0.4798395 1.0000000
>
> ex_corr_lm(f_lmer)
a b
a 1.0000000 -0.4798395
b -0.4798395 1.0000000
If we set data=TRUE
, we can obtain the respective columns. ex_corr_lm()
throws a list with correlations and data columns in this case.
> ex_corr_lm(f_lm, TRUE)
$corr
a b
a 1.0000000 -0.4798395
b -0.4798395 1.0000000
$corr_data
a b
1 1 1
2 2 4
3 3 9
4 4 16
5 5 25
6 1 36
7 2 49
8 3 64
9 4 81
10 5 100
We can use cor()
on them. Note, that correlations of the estimated coefficients and the correlation between the variables themselves aren't necessarily the same.
> ex_corr_lm(f_lm, TRUE)$corr_data |> cor()
a b
a 1.0000000 0.4798395
b 0.4798395 1.0000000
Data:
> set.seed(42)
> d <- data.frame(a=rep(1:5, 2), b=(1:10)^2);d$x <- .5 + 2*d$a + .1*d$b + rnorm(10,,.5)
> f_lm <- lm(x ~ a + b, d,)
> f_glm <- glm(x ~ a + b, d, fam=gaussian())
> f_lmer <- lme4::lmer(x ~ a + b + (1|a), d)