Under what circumstances is `vapply()` slower than `sapply()`?

The documentation for the *apply() functions states:

vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use. [Emphasis mine.]

It makes sense to me why it would be faster- less time wasted checking types - but, given that they could have said something like 'vapply() is as fast or faster than sapply()', but chose not to, I interpreted their choice of sometimes faster as them potentially meaning 'for most tasks vapply() is on average faster, but in some cases it could be on average the same speed, or in others even slower'- which seems quite odd to me. Why would it ever be slower? Advanced R states 'vapply() is faster than sapply()', which is fairly categorical, in contrast.

Am I misunderstanding this, or are there circumstances in which vapply() is slower than sapply(), and if so, what are they?

For example the rationale could be because of a difference in garbage collection, or the speed with which certain types are processed, or allocating memory or something else (these are wild guesses).

Research I've done:

Surprisingly, I couldn't find addressing this online, on StackOverflow or elsewhere. There's plenty of questions that reference vapply, and its safety. In a few comparisons, while vapply() was as fast or faster than sapply(), there were many iterations which are faster than the slowest vapply() iteration (and one where apply() was significantly faster than either lapply() or vapply(). So long story short, I'm a bit lost!

Any help you could provide would be greatly appreciated!

Solution

The checks which make `vapply()` fast are not free

In order to contrive circumstances where vapply() is slower than sapply(), let's look at the source. Most of the work in sapply() is done by lapply(). The C code for lapply() is quite simple. The relevant part is (comments mine):

// Allocate a vector for the output with the length of the input vector/list
ans = PROTECT(allocVector(VECSXP, n));
// Loop through input list, apply relevant function and 
// assign result to each respective element of the output list
for(int i = 0; i < n; i++) {
    defineVar(install("x"), VECTOR_ELT(list, i), rho);
    SET_VECTOR_ELT(ans, i, eval(expr, rho));
}

Essentially this creates an output list of the same length as the input list, iterating through it to set every element to the result of the user-provided function applied to each element of the input. sapply() then runs the result through simplify2array().

Conversely, the C code for vapply() does a lot more work. A lot of this is optimisation which makes it quicker than sapply(), e.g. allocating an atomic vector immediately as the output, rather than allocating a list and then simplifying into a vector. However, it also contains this:

// Check that the result is the correct length for the output vector/list
if (length(val) != commonLen)
error(_("values must be length %d,\n but FUN(X[[%d]]) result is length %d"),
        commonLen, i+1, length(val));

We tell vapply() the length and type of the output. This means that if, for example, if we tell vapply() that the output is integer(1), it needs to check that each iteration produces an integer vector of length 1.

A case where those checks are expensive

One way to create costly checks is to return a value where checking the length is expensive. Consider the simple example:

lapply(1, \(i) seq(1e9))

lapply() will run very quickly here. seq(1e9) produces an ALTREP, an alternate representation. This means that rather than having to allocate a vector of length 1e9, it allocates a much smaller object which essentially holds the start value, end value and increment. However, the docs for ALTREP state:

To existing C code ALTREP objects look like ordinary R objects.

This means vapply() does not know that that this is an ALTREP, and so it needs to check the length in a very costly way (much more costly than just running length() in R, which knows what an ALTREP is).

sapply() also has to do something costly. It basically does this:

simplify2array(list(seq(1e9)))

This creates a one-column matrix with 1e9 rows, i.e. it evaluates the ALTREP into a standard integer vector, so it allocates a large vector in RAM.

So vapply() and sapply() both have to do something considerably more expensive than lapply(). The question is: which is costlier?

Benchmarking the contrived case

Let's put this to the test:

results <- bench::mark(
    min_iterations = 3,
    max_iterations = 100,
    check = FALSE,
    time_unit = "s",
    lapply = {
        lapply(1, \(i) seq(1e9))
    },
    sapply = {
        sapply(1, \(i) seq(1e9))
    },
    vapply = {
        vapply(1, \(i) seq(1e9), numeric(1e9))
    }
)

Results

  expression         min     median  `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
  <bch:expr>       <dbl>      <dbl>      <dbl> <bch:byt>    <dbl> <int> <dbl>      <dbl>
1 lapply      0.00000954  0.0000277 31233.            0B   0        100     0    0.00320
2 sapply     23.3        27.9           0.0309    11.2GB   0.0309     3     3   97.0    
3 vapply     71.8        79.6           0.0126    22.4GB   0.0251     3     6  239.

We can see here that vapply() is substantially slower than sapply(). There are some caveats: these tests are just on my PC, and it was so slow that I only did three iterations. Also, I did have to do some playing around to get to here. With a vector of less than length 1e9, vapply() is faster than sapply().

Plot of results

ggplot2::autoplot(results) +
    labs(title = "Comparison of results", y = "Time (log scale)", x = "Expression")

Note that time is on a log scale.

It is worth pointing out that, fun as it was to engineer this situation, this is not typical. In the vast majority of tasks for which R is used, vapply() is likely to be considerably faster than sapply(). Also, as you know, there are other benefits, such as that vapply() ensures the return type is guaranteed.

_{Update: I tried this with R 4.4 a year later (October 2024) with a much faster PC with more RAM. I was able to replicate the results on Windows, but on WSL (Ubuntu) `vapply()` was faster than `sapply()`. So even this contrived situation seems to be platform-dependent.}