rweightedz-score

Apply modified z score function by group with weight


I have a function to calculate a modified z score of a variable like the following:

calculate_modified_z_score <- function(x) {
  median_x <- weighted.median(x, w = df$weight, na.rm = TRUE)
  mad_x <- mad(x, constant = 1, na.rm = TRUE) # MAD with a scaling factor of 1, and excluding NA values

  if (mad_x == 0) {
    meanAD_x <- mean(abs(x - median_x), na.rm = TRUE) # MAD, excluding NA values
    return((x - median_x) / (1.253314 * meanAD_x))
  } else {
    return((x - median_x) / (1.486 * mad_x))
  }
}

I run this on my dataframe in my code like the following:

df %>% mutate(z_of_var = calculate_modified_z_score(var))

This works. However I want to perform this operation by group, so that the weighted median is taken by each level of group_var. However, the problem with this is that the weight variable no longer works with this because it is a different length. So I receive the error:

df %>% group_by(group_var) %>% mutate(z_of_var = calculate_modified_z_score(var))

Error in `mutate()`:
ℹ In argument: `z_of_var = calculate_modified_z_score(var)`.
ℹ In group 1: `group_var = "1"`.
Caused by error in `weighted.quantile()`:
! length(x) == length(w) is not TRUE

I get why this doesn't work, but if I dont use df in specifying the function, it doesn't work. If I do this:


calculate_modified_z_score <- function(x) {
  median_x <- weighted.median(x, w = weight, na.rm = TRUE)
  mad_x <- mad(x, constant = 1, na.rm = TRUE) # MAD with a scaling factor of 1, and excluding NA values

  if (mad_x == 0) {
    meanAD_x <- mean(abs(x - median_x), na.rm = TRUE) # MAD, excluding NA values
    return((x - median_x) / (1.253314 * meanAD_x))
  } else {
    return((x - median_x) / (1.486 * mad_x))
  }
}

df %>% group_by(group_var) %>% mutate(z_of_var = calculate_modified_z_score(var))

I receive the error:


Error in `mutate()`:
ℹ In argument: `z_po_mil = calculate_modified_z_score(var)`.
ℹ In group 1: `group_var = "a"`.
Caused by error in `calculate_modified_z_score()`:
! object 'weight' not found
Backtrace:
  1. ... %>% select(z_po_mil)
 10. global calculate_modified_z_score(var)
 11. spatstat.geom::weighted.median(x, w = vote, na.rm = TRUE)
 13. spatstat.geom::weighted.quantile(...)
 14. base::as.vector(w)

How can I perform this function by group and use the weight for each observation within group?


Solution

  • Looks like this did the trick! Much simpler than I had thought.

    calculate_modified_z_score <- function(x,y) {
      median_x <- weighted.median(x, w = y, na.rm = TRUE)
      mad_x <- mad(x, constant = 1, na.rm = TRUE) # MAD with a scaling factor of 1, and excluding NA values
    
      if (mad_x == 0) {
        meanAD_x <- mean(abs(x - median_x), na.rm = TRUE) # MAD, excluding NA values
        return((x - median_x) / (1.253314 * meanAD_x))
      } else {
        return((x - median_x) / (1.486 * mad_x))
      }
    }
    
    df %>% mutate(z_of_var = calculate_modified_z_score(var,weight))