rnse

Create a new column using non-standard evaluation in R


I am working with non-standard evaluation in R. I have done group by and summarize in a dataframe using rlang as explained here

To follow the same example, I am left with a dataframe that looks like this:

  x y     q   p
1 0 7 325.8 Inf
2 1 7 317.1 Inf

The calculations that were done here are these:

mtcars %>%
    group_by(x = am) %>%
    summarize(y = sum(vs),
              q = sum(mpg),
              p = sum(mpg/vs))

Now, imagine the original code would be

mtcars %>% 
    group_by(x = am) %>% 
    summarize(y = sum(vs),
              q = sum(mpg),
              p = sum(mpg / vs)) %>%
    mutate(h = q / y)

instead.

How can I achieve this using NSE?

The problem I'm facing is that now the dataframe columns are x, y, q and p, not the original mtcars column names (in this example). Therefore, if I try to use mutate using an external vector like: y_delayed <- c("h" = "mpg / vs") it doesn't work because those columns do no longer exist.

drpexpr <- rlang::parse_exprs(y_delayed)
  mtcars%>%
    transmute(!!!drpexpr)

Error in `transmute()`:
ℹ In argument: `h = mpg/vs`.
Caused by error:
! object 'mpg' not found

EDIT:

I cannot manually change the y_delayed <- c("h" = "mpg / vs") to y_delayed <- c("h" = "q / y")

ADDING SOME ADDITIONAL CONTEXT:

As explained in the link I provide at the beginning of the question, the groups and the summarised expressions are given in 2 separate vectors:

x_groups <- c("x" = "am")
y_now <- c("y" = "vs", "q" = "mpg", "p" = "mpg/vs")

I then use this code provided in an answer to that question:

grpexpr <- rlang::parse_exprs(x_groups)
sexpr <- rlang::parse_exprs(y_now) |> lapply(function(x) bquote(sum(.(x))))

mtcars %>%
       group_by(!!!grpexpr) %>%
       summarize(!!!sexpr)

to get to the dataframe I have with columns x, y, q and p.

The problem now is that I need and additional step, which would be a mutate using dplyr, which column name(s) and calculations are specified in another external vector y_delayed <- c("h" = "mpg / vs") that I cannot change. Also, this is called y_delayed because it has to happen after the aggregation step.

EDIT 2:

The final dataframe should look like this:

  x y     q   p  h
1 0 7 325.8 Inf  46.54286
2 1 7 317.1 Inf  45.30000

Solution

  • You need to substitute the old variable names for the new variable names.

    library(rlang)
    library(dplyr)
    
    # Make list of symbols to pass to substitute - need to swap names and values
    yd_vars <- lapply(as.list(setNames(names(y_now), y_now)), as.name)
    
    # Expressions to inject - lapply to vectorize if needed but also to autoname output
    yd_expr <- lapply(y_delayed, \(x) do.call(substitute, list(parse_expr(x), yd_vars)))
    

    Which results in:

    $h
    q/y
    
    mtcars %>%
      group_by(!!!grpexpr) %>%
      summarize(!!!c(sexpr, yd_expr))
    
    # A tibble: 2 × 5
          x     y     q     p     h
      <dbl> <dbl> <dbl> <dbl> <dbl>
    1     0     7  326.   Inf  46.5
    2     1     7  317.   Inf  45.3
    

    You could also do it by text substitution but it will most likely be a more brittle approach:

    yd_expr <- parse_exprs(setNames(stringr::str_replace_all(y_delayed, setNames(names(y_now), sprintf("\\b%s\\b",
        y_now))), names(y_delayed)))