I am testing for outliers using the iris dataset
mod <- lm(Sepal.Width ~ Sepal.Length*Species, data = iris)
I use rstudent()
to calculate the studentized residuals, and add an indicator whether the value is outside the range [-2, 2].
iris2 <-
iris |>
mutate(res_stud = rstudent(mod),
res_stud_large = as.numeric(!between(res_stud, -2, 2)))
but I get this error:
Error in `mutate()`:
ℹ In argument: `res_stud_large = as.numeric(!between(res_stud, -2, 2))`.
Caused by error:
! length(g) must match nrow(X)
Backtrace:
1. dplyr::mutate(...)
13. base::stop(`<Rcpp::xc>`)
>
I checked that
str(rstudent(mod))
Named num [1:150] -0.0113 -1.2776 0.0609 -0.0142 0.6545 ...
- attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...
Probably because of this, I get this error?
I tried using subset
function but without success.
I think there may be something else going on here. Using just dplyr
and the iris
it works.
library(dplyr)
mod <- lm(Sepal.Width ~ Sepal.Length*Species, data = iris)
iris2 <-
iris |>
mutate(res_stud = rstudent(mod),
res_stud_large = as.numeric(!between(res_stud, -2, 2)))
This works because the iris data are complete (no NA
values). If we impose a missing value, you'll see that it fails in the same way as your example:
iris$Species[1] <- NA
mod <- lm(Sepal.Width ~ Sepal.Length*Species, data = iris)
iris2 <-
iris |>
mutate(res_stud = rstudent(mod),
res_stud_large = as.numeric(!between(res_stud, -2, 2)))
#> Error in `mutate()`:
#> ℹ In argument: `res_stud = rstudent(mod)`.
#> Caused by error:
#> ! `res_stud` must be size 150 or 1, not 149.
If you estimate the model with na.action = na.exclude
, then when R returns things like fitted values or residuals, it will do so including the NA
values for the cases that were not used in the analysis - making the output the same size as the original input.
mod2 <- lm(Sepal.Width ~ Sepal.Length*Species, data = iris,
na.action = na.exclude)
iris2 <- iris |>
mutate(res_stud = rstudent(mod2),
res_stud_large = as.numeric(!between(res_stud, -2, 2)))
I wonder if something like this happened along the way that wasn't documented in your example?
Created on 2025-03-16 with reprex v2.1.1.9000