set.seed(0)
## 2 response of 10 observations each
response <- matrix(rnorm(20), 10, 2)
## 3 covariates with 10 observations each
predictors <- matrix(rnorm(30), 10, 3)
fit <- lm(response ~ predictors)
I have been generating residual plots for the entire model using:
plot(fitted(fit),residuals(fit))
However, I would like to make individual plots for each predictor covariate. I can do them one at a time by:
f <- fitted(fit)
r <- residual(fit)
plot(f[,1],r[,1])
The issue with this approach however, is that it needs to be generalizable for data sets with more predictor covariates. Is there a way that I use plot while iterating through each column of (f) and (r)? Or is there a way that plot()
can group each co-variate by colour?
Make Sure you are using standardised residuals rather than raw residuals
I often see plot(fitted(fit), residuals(fit))
but it is statistically wrong. We use plot(fit)
to generate diagnostic plot, because we need standardised residuals rather than raw ones. Read ?plot.lm
for more. But plot
method for "mlm" is poorly supported:
plot(fit)
# Error: 'plot.mlm' is not implemented yet
Define "rstandard" S3 method for "mlm"
plot.mlm
is not supported for many reasons, one of which is the lack of rstandard.mlm
. For "lm" and "glm" class, there is a generic S3 method "rstandard" to get standardised residuals:
methods(rstandard)
# [1] rstandard.glm* rstandard.lm*
There is no support for "mlm". So we shall fill this gap first.
It is not difficult to get standardised residuals. Let hii
be diagonals of the hat matrix, the point-wise estimated standard error for residuals is sqrt(1 - hii) * sigma
, where sigma = sqrt(RSS / df.residual)
is estimated residual standard error. RSS
is residual sum of squares; df.residual
is residual degree of freedom.
hii
can be computed from matrix factor Q
of QR factorization of model matrix: rowSums(Q ^ 2)
. For "mlm", there is only one QR decomposition since the model matrix is the same for all responses, hence we only need to compute hii
once.
Different response has different sigma
, but they are elegantly colSums(residuals(fit) ^ 2) / df.residual(fit)
.
Now, let's wrap up those ideas to get our own "rstandard" method for "mlm":
## define our own "rstandard" method for "mlm" class
rstandard.mlm <- function (model) {
Q <- with(model, qr.qy(qr, diag(1, nrow = nrow(qr$qr), ncol = qr$rank))) ## Q matrix
hii <- rowSums(Q ^ 2) ## diagonal of hat matrix QQ'
RSS <- colSums(model$residuals ^ 2) ## residual sums of squares (for each model)
sigma <- sqrt(RSS / model$df.residual) ## ## Pearson estimate of residuals (for each model)
pointwise_sd <- outer(sqrt(1 - hii), sigma) ## point-wise residual standard error (for each model)
model$residuals / pointwise_sd ## standardised residuals
}
Note the use of .mlm
in function name to tell R this is S3 method associated. Once we have defined this function, we can see it in "rstandard" method:
## now there are method for "mlm"
methods(rstandard)
# [1] rstandard.glm* rstandard.lm* rstandard.mlm
To call this function, we don't have to explicitly call rstandard.mlm
; calling rstandard
is enough:
## test with your fitted model `fit`
rstandard(fit)
# [,1] [,2]
#1 1.56221865 2.6593505
#2 -0.98791320 -1.9344546
#3 0.06042529 -0.4858276
#4 0.18713629 2.9814135
#5 0.11277397 1.4336484
#6 -0.74289985 -2.4452868
#7 0.03690363 0.7015916
#8 -1.58940448 -1.2850961
#9 0.38504435 1.3907223
#10 1.34618139 -1.5900891
Standardised residuals are N(0, 1)
distributed.
Getting residuals v.s. fitted plot for "mlm"
Your initial try with:
f <- fitted(fit); r <- rstandard(fit); plot(f, r)
is not a bad idea, provided that dots for different models can be identified from each other. So we can try using different point colours for different models:
plot(f, r, col = as.numeric(col(f)), pch = 19)
Graphical arguments like col
, pch
and cex
can take vector input. I ask plot
to use col = j
for the r[,j] ~ f[,j]
, where j = 1, 2,..., ncol(f)
. Read "Color Specification" of ?par
for what col = j
means. pch = 19
tells plot
to draw solid dots. Read basic graphcial parameters for various choices.
Finally you may want a legend. You can do
plot(f, r, col = as.numeric(col(f)), pch = 19, ylim = c(-3, 4))
legend("topleft", legend = paste0("response ", 1:ncol(f)), pch = 19,
col = 1:ncol(f), text.col = 1:ncol(f))
In order to leave space for the legend box we extend ylim
a little bit. As standardised residuals are N(0,1)
, ylim = c(-3, 3)
is a good range. Should we want to place the legend box on the top left, we extend ylim
to c(-3, 4)
. You can customize your legend even more via ncol
, title
, etc.
How many responses do you have?
If you have no more than a few responses, above suggestion works nicely. If you have plenty, plotting them in separate plot is suggested. A for
loop as you found out is decent, except that you need split plotting region into different subplots, possibly using par(mfrow = c(?, ?))
. Also set inner margin mar
and outer margin oma
if you take this approach. You may read How to produce a nicer plot for my categorical time series data in a matrix? for one example of doing this.
If you have even more responses, you might want a mixture of both? Say if you have 42 responses, you can do par(mfrow = c(2, 3))
, then plot 7 responses in each subfigure. Now the solution is more opinion based.