rdataframemontecarlo

Issue outputting data generated from a user-defined function that uses sapply()


I am attempting to summarize the data generated from looping through data using sapply() and am not sure why I can't access the dataframe that's generated. I'm setting up a monte-carlo simulation, and I have a function that sets up parameters and an estimate, and I wish to apply that function a set number of times per datapoint in a data set. So, I am using replicate() for my function, for which I have within a function that uses sapply(). It appears to work, however the data cannot be accessed in order to describe the resulting distributions of estimates; the data frame is not output, but is printed. Nice, but I need to now take them and calculate means and confidence intervals, and probably plot some.

Here is basically what I'm trying to do:

repapply <- function(iter, datapoint){
  estimates <- sapply(datapoint, function(datapoint){
    replicate(n=iter, expr=generate_data(datapoint))
  })
  estimatesdf <- data.frame(estimates)
  return(estimatesdf)
}
#test:
repapply(1000, measure)

Could anyone explain why there isn't any output dataframe? It prints the information below:

X1                                                   X2
1 476.454335, 6.240725, 4.433396, 24.017384, 36.900104 594.890067, 2.310075, 7.210158, 21.379092, 30.256849
X3                                                   X4
1 359.167706, 5.817891, 7.276368, 20.776742, 23.539489 459.848602, 3.826445, 4.319803, 23.774576, 52.130509
X5                                                   X6
1 504.624220, 8.159456, 4.110860, 23.805009, 42.983076 578.252014, 6.749054, 5.880862, 23.312351, 42.320465
X7                                                   X8
1 427.196750, 7.458934, 3.295953, 24.764725, 45.647360 284.724297, 5.234101, 6.481678, 20.159478, 42.160186
X9                                                  X10
1 307.605356, 4.386591, 5.562230, 22.711697, 3.675961 418.109465, 5.618156, 3.135784, 24.503502, 34.891379
 ...

Solution

  • Short Answer:

    You have to assign repapply(1000, measure) to a value/object. In other words, you have to name it. For instance:

    df <- repapply(1000, measure)

    Why:

    When you define objects in a function environment, they are local to that function's scope. So, when you return estimatesdf, you are really just returning the literal data.frame that it points to. Hence, you could even compress the last two lines of your function into return(data.frame(estimates)), and you would get the same result.


    Alternatively:

    Unlike objects defined in a function, pre-existing (outside the scope of the function) objects which are modified in a function do retain their value outside of the function's scope. If you define estimatesdf (e.g., by setting it equal to 0) outside the function, and eliminated the return() call, then running repapply(1000, measure) would set the estimatesdf to the desired data.frame.