I am trying to modify the identify_outliers
function in rstatix
package to allow for any coefficient when determining outliers in the is_outlier
function. Here is the code for identify_outliers
:
function (data, ..., variable = NULL)
{
is.outlier <- NULL
if (is_grouped_df(data)) {
results <- data %>% doo(identify_outliers, ..., variable = variable)
if (nrow(results) == 0)
results <- as.data.frame(results)
return(results)
}
if (!inherits(data, "data.frame"))
stop("data should be a data frame")
variable <- data %>% get_selected_vars(..., vars = variable)
n.vars <- length(variable)
if (n.vars > 1)
stop("Specify only one variable")
values <- data %>% pull(!!variable)
results <- data %>% mutate(is.outlier = is_outlier(values),
is.extreme = is_extreme(values)) %>% filter(is.outlier ==
TRUE)
if (nrow(results) == 0)
results <- as.data.frame(results)
results
}
Here I've created a function called crazy_outliers
by modifying identify_outliers
. I've removed the parts pertaining to is_extreme
as I don't need that portion, and I've added an argument y
to allow for input of a coefficient into the is_outlier
function:
crazy_outliers <- function (data, ..., variable = NULL, y) #added y argument
{
is.outlier <- NULL
if (is_grouped_df(data)) {
results <- data %>% doo(crazy_outliers, ..., variable = variable, y = y) # changed identify_outliers to crazy_outliers and added y argument
if (nrow(results) == 0)
results <- as.data.frame(results)
return(results)
}
if (!inherits(data, "data.frame"))
stop("data should be a data frame")
variable <- data %>% get_selected_vars(..., vars = variable)
n.vars <- length(variable)
if (n.vars > 1)
stop("Specify only one variable")
values <- data %>% pull(!!variable)
results <- data %>% mutate(is.outlier = is_outlier(values, coef = y)) #Here I utilize the y argument to specify the coefficient for determining outliers
if (nrow(results) == 0)
results <- as.data.frame(results)
results
}
However, I receive the following error when trying to use the function:
Error in `mutate()`:
ℹ In argument: `data = map(.data$data, .f, ...)`.
Caused by error in `map()`:
ℹ In index: 1.
Caused by error in `get_selected_vars()`:
! could not find function "get_selected_vars"
I haven't even modified the get_selected_vars()
function or its arguments at all, and it exists in the original identify_outliers
function, so I'm confused as to what's going on. I also cannot find what package it is from, as when I replace it with rstatix::get_selected_vars
I still cannot get the function to work. Any advice is appreciated, thank you!
get_selected_vars
is an unexported utlity function from rstatix
. Functions can be defined and used in packages, but will not be made available to users of the package unless explicitly exported in the NAMESPACE. You are presumably writing your crazy_outliers
function in an R script or notebook, not editing and loading the package itself, so it will not have access to get_selected_vars
. You can access it directly by using :::
, e.g. rstatix:::get_selected_vars()
, but this is risky since packages may change how utility functions are defined with little notice. Alternatively, you can inline your own version of the function.