rregex

Replace variables in a formula with their definitions


[ In the program file I am using, I would like to have a translation of the variables in the program to the explanation of what the variables mean. I have the list of variables and their corresponding definitions. I also have the formulas, sample provided. The output I would like would be the translation of the variable in the formulas (provided in the example) to the explanations of what the variables represent. My example is simplistic, but I typically have about 100 variables and 100 formulas.

I tried to find similar questions but could not find the answer.

structure(list(variable = c("cs", "csp", "cb", "cc", "ccel", 
"ccrt"), definition = c("cost of salad", "cost of soup", "cost of bread", 
"cost of chicken", "cost of celery", "cost of carrot"), formula = c("cs=cb+ccel+cc", 
"csp=cc+ccel+crt", NA, NA, NA, NA), Translation = c("cost of salad=cost of bread+cost of celery+cost of chicken", 
"cost of soup=cost of chicken+cost of celery+cost of carrot", 
NA, NA, NA, NA)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-6L))

Solution

  • 1) Using language objects. It is common in R to convert expressions to language objects and then process them.

    To do that define a function subst which

    Thus using the input in the Note at the end

    subst <- function(expr, defs) {
      expr2 <- lapply(expr, str2lang)
      defs2 <- defs[1:2] |> deframe() |> as.list() |> lapply(as.name)
      s <- sapply(expr2, \(expr) deparse1(do.call(substitute, list(expr, defs2))))
      s <- gsub("`", "", s)
      ifelse(s == "NA", NA, s)
    }
    
    # test run
    DF %>%
      mutate(translation = subst(formula, pick(variable, definition)))
    

    giving

    # A tibble: 6 × 4
      variable definition      formula        translation                           
      <chr>    <chr>           <chr>          <chr>                                 
    1 cs       cost of salad   cs=cb+ccel+cc  cost of salad = cost of bread + cost …
    2 csp      cost of soup    csp=cc+cel+crt cost of soup = cost of chicken + cel …
    3 cb       cost of bread   <NA>           <NA>                                  
    4 cc       cost of chicken <NA>           <NA>                                  
    5 ccel     cost of celery  <NA>           <NA>                                  
    6 ccrt     cost of carrot  <NA>           <NA>                                  
    

    2) We could alternately use gsubfn to perform the substitutions. We assume that the variables consist entirely of word characters. This gives the same result.

    library(gsubfn)
    
    subst2 <- function(expr, defs) {
      defs2 <- defs[1:2] |> deframe() |> as.list()
      gsubfn("\\w+", defs2, expr) 
    }
     
    DF |>
      mutate(translation = subst2(formula, pick(variable, definition)))
    

    Note

    library(dplyr)
    library(tibble)
    
    DF <- tibble(
      variable = c("cs", "csp", "cb", "cc", "ccel", "ccrt"),
      definition = c(
        "cost of salad", "cost of soup", "cost of bread", "cost of chicken",
        "cost of celery", "cost of carrot"
      ),
      formula = c("cs=cb+ccel+cc", "csp=cc+cel+crt", NA, NA, NA, NA),
    )