rdplyrpurrrnseacross

How can I mutate using across and a dynamically-generated list of functions


I have a data frame which encapsulates a number of statistics for an exam, tracked over different years and groups. I would like to construct a function which adds new columns giving the change in these statistics for each group from a dynamically supplies list of reference years.

Here is an example of the output I would like.

grades <- data.frame(
  Group = c(rep("A", 4), rep("B", 4)),
  Year  = rep(seq(2015, 2018), 2),
  Mean  = c(seq(100, 130, 10), seq(200, 260, 20)),
  PassR = c(seq(0.5, 0.53, 0.01), seq(0.6, 0.66, 0.02))
)

grades |> group_by(Group) |> calculateDifferences(c(2015, 2016))

# A tibble: 8 × 8
# Groups:   Group [2]
  Group  Year  Mean PassR Mean_Diff2015 Mean_Diff2016 PassR_Diff2015 PassR_Diff2016
  <chr> <int> <dbl> <dbl>         <dbl>         <dbl>          <dbl>          <dbl>
1 A      2015   100  0.5              0           -10         0             -0.0100
2 A      2016   110  0.51            10             0         0.0100         0     
3 A      2017   120  0.52            20            10         0.0200         0.0100
4 A      2018   130  0.53            30            20         0.0300         0.0200
5 B      2015   200  0.6              0           -20         0             -0.0200
6 B      2016   220  0.62            20             0         0.0200         0     
7 B      2017   240  0.64            40            20         0.0400         0.0200
8 B      2018   260  0.66            60            40         0.0600         0.0400

My best attempt is the following function, but it runs into problems of scoping the Year column within the list.

# Calculate differences from the given year for both mean and pass rate
calculateDifferences <- function(data, diffYears) {
  mutate(data,
    across(
      any_of(c("Mean", "PassR")), 
      #list(Diff2015 = function(col) col - col[Year == 2015],
      #     Diff2016 = function(col) col - col[Year == 2016]),
      map(as.list(diffYears), function(year) { function(col) col - col[Year == year] }) |>
        set_names(str_c("Diff", diffYears)),
      .names = "{.col}_{.fn}"
    )
  )
}

Running this code complains that it cannot find the object Year. I've tried introducing some NSE to delay evaluation of the variable, but neither !!substitute("Year") nor !!quo("Year") produces the desired output: it merely throws as a dplyr::mutate_incompatible_size <named_list> error. Trying to replace it with .data[["Year"]] complains that it's not in a data masking context.

If I hard-code the years (as in the commented section of the function) this runs correctly and produces the desired output, but it cannot adapt to a dynamically supplied list of years.

I can try to separately pull the Year column with data[["Year"]]. This works well if the data is ungrouped, but falls apart if the data is grouped.


Solution

  • Here is an alternative approach that relies on returning a tibble inside across and then unpacking it using the .unpack argument. I've altered the function so that the variables can be passed as an argument instead of hard coded (which also allows you to use tidyselect features if desired) and also the grouping.

    library(purrr)
    library(dplyr)
    
    calculateDifferences <- function(data, vars, diffYears, group = Group) {
      data |>
      mutate(
        across({{  vars  }}, ~
                 map(diffYears, \(year) 
                     tibble("Diff{year}" := .x - .x[Year == year])
                     ) |>
                 list_cbind(),
               .unpack = TRUE),
        .by = {{  group  }}
      )
    }
    
    grades |>
      calculateDifferences(c(Mean, PassR), c(2015, 2016)) 
    
      Group Year Mean PassR Mean_Diff2015 Mean_Diff2016 PassR_Diff2015 PassR_Diff2016
    1     A 2015  100  0.50             0           -10           0.00          -0.01
    2     A 2016  110  0.51            10             0           0.01           0.00
    3     A 2017  120  0.52            20            10           0.02           0.01
    4     A 2018  130  0.53            30            20           0.03           0.02
    5     B 2015  200  0.60             0           -20           0.00          -0.02
    6     B 2016  220  0.62            20             0           0.02           0.00
    7     B 2017  240  0.64            40            20           0.04           0.02
    8     B 2018  260  0.66            60            40           0.06           0.04