I have 2 functions. The primary function brings in data, manipulates it, passes it to the secondary function (using pmap_dfr) for more processing, takes those values and returns results. The only problem is that pmap_dfr cannot seem to find the object I create in that primary function. I'm guessing it's a scoping issue, but I don't really know how to fix it if I don't want to pop that processed data into the global environment. Any advice?
reprex
df1 <- data.frame(x = rnorm(100),
y = rnorm(100, 1, .3),
z = rnorm(100, 15, 3))
#secondary function
measures <- function(data_name = "df2", col_name_diff = "diff_xy"){
dataf <- eval(sym(data_name))
mean = (sum(dataf[col_name_diff], na.rm = T))/nrow(dataf)
variance = sd(unlist(dataf[col_name_diff]), na.rm = T)^2
x <- data.frame(mean = mean,
variance = variance)
x
}
#primary function
aggregate_results <- function(dataset_name = "df1"){
dataf <- eval(sym(dataset_name))
df2 <- dataf %>%
mutate(diff_xy = x-y,
diff_yx = y-x,
diff_yz = y-z)
data_name <- "df2"
col_name_diff <- df2 %>% select(contains("diff")) %>% names
params <- crossing(data_name, col_name_diff)
results <- pmap_dfr(.f = measures, .l = params)
}
aggregate_results()
#get
Error: object 'df2' not found
#want
name mean variance
1 diff_xy -1.019687 1.164101
2 diff_yx 1.019687 1.164101
3 diff_yz -14.237093 9.755626```
Normally one passes the object itself rather than its name but if there is a good reason to pass its name then the environment holding that name should also be passed since otherwise the scoping rules are such that objects not already in the current function will be looked up in the environment in which the function was defined, not in the caller. The lines marked ## have been added or modified. The first line so marked is to make the code reproducible.
library(dplyr); library(purrr); library(tidyr); set.seed(12) ##
df1 <- data.frame(x = rnorm(100),
y = rnorm(100, 1, .3),
z = rnorm(100, 15, 3))
measures <- function(data_name = "df2", col_name_diff = "diff_xy",
envir = parent.frame()){ ##
dataf <- get(data_name, envir) ##
mean = (sum(dataf[col_name_diff], na.rm = T))/nrow(dataf)
variance = sd(unlist(dataf[col_name_diff]), na.rm = T)^2
x <- data.frame(mean = mean,
variance = variance)
x
}
aggregate_results <- function(dataset_name = "df1", envir = parent.frame()){ ##
dataf <- get(dataset_name, envir) ##
df2 <- dataf %>%
mutate(diff_xy = x-y,
diff_yx = y-x,
diff_yz = y-z)
data_name <- "df2"
col_name_diff <- df2 %>% select(contains("diff")) %>% names
params <- crossing(data_name, col_name_diff)
results <- pmap_dfr(.f = measures, .l = params, envir = environment()) ##
}
aggregate_results()