rrete

Fast handling of rules in a simulation


If you only have a few rules in a discrete event simulation this is not critical but if you have a lot of them and they can interfere with each other and you may want to track the "which" and "where" they are used.

Here is an simple example which shows that I loose a factor 100 in speed. Assume you run a simulation and one (of many rules) is: Select the states with time less 5:

> a <- rnorm(100, 50, 10)
> print(summary(microbenchmark::microbenchmark(a[a < 5], times = 1000L, unit = "us")))
   expr  min   lq     mean median   uq    max neval
a[a < 5] 0.76 1.14 1.266745  1.141 1.52 11.404  1000

myfun <- function(a0) {
  return(eval(parse(text = myrule)))
}

> myrule <- "a < a0" # The rule could be read from a file.
print(summary(microbenchmark::microbenchmark(a[myfun(5)], times = 1000L, unit = "us")))
    expr    min      lq     mean  median      uq     max neval
a[myfun(5)] 137.61 140.271 145.6047 141.411 142.932 343.644  1000

Note: I don't think that I need an extra rete package which can do the book keeping efficiently. But if there are other opinions, let me know...


Solution

  • Let's profile this:

    Rprof()
    for (i in 1:1e4) a[myfun(5)]
    Rprof(NULL)
    summaryRprof()
    
    #$by.self
    #             self.time self.pct total.time total.pct
    #"parse"           0.36    69.23       0.48     92.31
    #"structure"       0.04     7.69       0.06     11.54
    #"myfun"           0.02     3.85       0.52    100.00
    #"eval"            0.02     3.85       0.50     96.15
    #"stopifnot"       0.02     3.85       0.06     11.54
    #"%in%"            0.02     3.85       0.02      3.85
    #"anyNA"           0.02     3.85       0.02      3.85
    #"sys.parent"      0.02     3.85       0.02      3.85
    #
    #$by.total
    #               total.time total.pct self.time self.pct
    #"myfun"              0.52    100.00      0.02     3.85
    #"eval"               0.50     96.15      0.02     3.85
    #"parse"              0.48     92.31      0.36    69.23
    #"srcfilecopy"        0.12     23.08      0.00     0.00
    #"structure"          0.06     11.54      0.04     7.69
    #"stopifnot"          0.06     11.54      0.02     3.85
    #".POSIXct"           0.06     11.54      0.00     0.00
    #"Sys.time"           0.06     11.54      0.00     0.00
    #"%in%"               0.02      3.85      0.02     3.85
    #"anyNA"              0.02      3.85      0.02     3.85
    #"sys.parent"         0.02      3.85      0.02     3.85
    #"match.call"         0.02      3.85      0.00     0.00
    #"sys.function"       0.02      3.85      0.00     0.00
    

    Most of the time is spent in parse. We can confirm this with a benchmark:

    microbenchmark(a[myfun(5)], times = 1000L, unit = "us")
    #Unit: microseconds
    #        expr    min     lq     mean median     uq     max neval
    # a[myfun(5)] 67.347 69.141 72.12806 69.909 70.933 160.303  1000
    
    a0 <- 5
    microbenchmark(parse(text = myrule), times = 1000L, unit = "us")
    #Unit: microseconds
    #                 expr    min     lq     mean median     uq     max neval
    # parse(text = myrule) 62.483 64.275 64.99432 64.787 65.299 132.903  1000
    

    If reading the rules as text from a file is a hard requirement, I don't think there is a way to speed this up. Of course, you should not parse the same rule repeatedly, but I assume you now that.

    Edit in response to a comment providing more explanation:

    You should store your rules as quoted expressions (e.g., in a list using saveRDS if you need them as a file):

    myrule1 <- quote(a < a0)
    myfun1 <- function(rule, a, a0) {eval(rule)}
    
    microbenchmark(a[myfun1(myrule1, a, 30)], times = 1000L, unit = "us")
    #Unit: microseconds
    #                      expr   min    lq     mean median    uq    max neval
    # a[myfun1(myrule1, a, 30)] 1.792 2.049 2.286815  2.304 2.305 30.217  1000
    

    For convenience, you could then make that list of expressions an S3 object and create a nice print method for it in order to get a better overview.