rrandom

possible bug in `rbinom()` for large numbers of trials


I've been writing some code that iteratively performs binomial draws (using rbinom) and for some callee arguments I can end up with the size being large, which causes R (3.1.1, both official or homebrew builds tested—so unlikely to be compiler related) to return an unexpected NA. For example:

rbinom(1,2^32,0.95)

is what I'd expect to work, but gives NA back. However, running with size=2^31 or prob≤0.5 works.

The fine manual mentions inversion being used when size < .Machine$integer.max is false, could this be the issue?


Solution

  • Looking at the source rbinom does the equivalent (in C code) of the following for such large sizes:

    qbinom(runif(n), size, prob, FALSE)
    

    And indeed:

    set.seed(42)
    rbinom(1,2^31,0.95)
    #[1] 2040095619
    set.seed(42)
    qbinom(runif(1), 2^31, 0.95, F)
    #[1] 2040095619
    

    However:

    set.seed(42)
    rbinom(1,2^32,0.95)
    #[1] NA
    set.seed(42)
    qbinom(runif(1), 2^32, 0.95, F)
    #[1] 4080199349
    

    As @BenBolker points out rbinom returns an integer and if the return value is larger than .Machine$integer.max, e.g., larger than 2147483647 on my machine, NA gets returned. In contrast qbinom returns a double. I don't know why and it doesn't seem to be documented.

    So, it seems like there is at least undocumented behavior and you should probably report it.