I have a function to calculate a modified z score of a variable like the following:
calculate_modified_z_score <- function(x) {
median_x <- weighted.median(x, w = df$weight, na.rm = TRUE)
mad_x <- mad(x, constant = 1, na.rm = TRUE) # MAD with a scaling factor of 1, and excluding NA values
if (mad_x == 0) {
meanAD_x <- mean(abs(x - median_x), na.rm = TRUE) # MAD, excluding NA values
return((x - median_x) / (1.253314 * meanAD_x))
} else {
return((x - median_x) / (1.486 * mad_x))
}
}
I run this on my dataframe in my code like the following:
df %>% mutate(z_of_var = calculate_modified_z_score(var))
This works. However I want to perform this operation by group, so that the weighted median is taken by each level of group_var
. However, the problem with this is that the weight
variable no longer works with this because it is a different length. So I receive the error:
df %>% group_by(group_var) %>% mutate(z_of_var = calculate_modified_z_score(var))
Error in `mutate()`:
ℹ In argument: `z_of_var = calculate_modified_z_score(var)`.
ℹ In group 1: `group_var = "1"`.
Caused by error in `weighted.quantile()`:
! length(x) == length(w) is not TRUE
I get why this doesn't work, but if I dont use df
in specifying the function, it doesn't work. If I do this:
calculate_modified_z_score <- function(x) {
median_x <- weighted.median(x, w = weight, na.rm = TRUE)
mad_x <- mad(x, constant = 1, na.rm = TRUE) # MAD with a scaling factor of 1, and excluding NA values
if (mad_x == 0) {
meanAD_x <- mean(abs(x - median_x), na.rm = TRUE) # MAD, excluding NA values
return((x - median_x) / (1.253314 * meanAD_x))
} else {
return((x - median_x) / (1.486 * mad_x))
}
}
df %>% group_by(group_var) %>% mutate(z_of_var = calculate_modified_z_score(var))
I receive the error:
Error in `mutate()`:
ℹ In argument: `z_po_mil = calculate_modified_z_score(var)`.
ℹ In group 1: `group_var = "a"`.
Caused by error in `calculate_modified_z_score()`:
! object 'weight' not found
Backtrace:
1. ... %>% select(z_po_mil)
10. global calculate_modified_z_score(var)
11. spatstat.geom::weighted.median(x, w = vote, na.rm = TRUE)
13. spatstat.geom::weighted.quantile(...)
14. base::as.vector(w)
How can I perform this function by group and use the weight
for each observation within group?
Looks like this did the trick! Much simpler than I had thought.
calculate_modified_z_score <- function(x,y) {
median_x <- weighted.median(x, w = y, na.rm = TRUE)
mad_x <- mad(x, constant = 1, na.rm = TRUE) # MAD with a scaling factor of 1, and excluding NA values
if (mad_x == 0) {
meanAD_x <- mean(abs(x - median_x), na.rm = TRUE) # MAD, excluding NA values
return((x - median_x) / (1.253314 * meanAD_x))
} else {
return((x - median_x) / (1.486 * mad_x))
}
}
df %>% mutate(z_of_var = calculate_modified_z_score(var,weight))