rfunctionpmap

pmap_dfr inside of larger function not finding object created within larger function


I have 2 functions. The primary function brings in data, manipulates it, passes it to the secondary function (using pmap_dfr) for more processing, takes those values and returns results. The only problem is that pmap_dfr cannot seem to find the object I create in that primary function. I'm guessing it's a scoping issue, but I don't really know how to fix it if I don't want to pop that processed data into the global environment. Any advice?

reprex

df1 <- data.frame(x = rnorm(100),
                  y = rnorm(100, 1, .3),
                  z = rnorm(100, 15, 3))

#secondary function
measures <- function(data_name = "df2", col_name_diff = "diff_xy"){
  dataf <- eval(sym(data_name))

  mean = (sum(dataf[col_name_diff], na.rm = T))/nrow(dataf)
  variance = sd(unlist(dataf[col_name_diff]), na.rm = T)^2
  
  x <- data.frame(mean = mean, 
                   variance = variance)  
  
  x
}

#primary function
aggregate_results <- function(dataset_name = "df1"){
  dataf <- eval(sym(dataset_name))
  
  df2 <- dataf %>% 
    mutate(diff_xy = x-y,
           diff_yx = y-x,
           diff_yz = y-z)
 
  data_name <- "df2"
  col_name_diff <- df2 %>% select(contains("diff")) %>% names
  params <- crossing(data_name, col_name_diff)
  
  results <- pmap_dfr(.f = measures, .l = params)
}

aggregate_results()

#get
Error: object 'df2' not found

#want
name       mean variance
1 diff_xy  -1.019687 1.164101
2 diff_yx   1.019687 1.164101
3 diff_yz -14.237093 9.755626```

Solution

  • Normally one passes the object itself rather than its name but if there is a good reason to pass its name then the environment holding that name should also be passed since otherwise the scoping rules are such that objects not already in the current function will be looked up in the environment in which the function was defined, not in the caller. The lines marked ## have been added or modified. The first line so marked is to make the code reproducible.

    library(dplyr); library(purrr); library(tidyr); set.seed(12) ##
    
    df1 <- data.frame(x = rnorm(100),
                      y = rnorm(100, 1, .3),
                      z = rnorm(100, 15, 3))
    
    measures <- function(data_name = "df2", col_name_diff = "diff_xy", 
        envir = parent.frame()){ ##
    
      dataf <- get(data_name, envir) ##
      mean = (sum(dataf[col_name_diff], na.rm = T))/nrow(dataf)
      variance = sd(unlist(dataf[col_name_diff]), na.rm = T)^2
      
      x <- data.frame(mean = mean, 
                       variance = variance)
      x
    }
    
    aggregate_results <- function(dataset_name = "df1", envir = parent.frame()){ ##
    
      dataf <- get(dataset_name, envir) ##
      df2 <- dataf %>% 
        mutate(diff_xy = x-y,
               diff_yx = y-x,
               diff_yz = y-z)
           data_name <- "df2"
      col_name_diff <- df2 %>% select(contains("diff")) %>% names
      params <- crossing(data_name, col_name_diff)
      
      results <- pmap_dfr(.f = measures, .l = params, envir = environment()) ##
    }
    
    aggregate_results()