rpurrr

R purrr package map() vs pmap() difference


I am trying to understand the behavior of purrr::map() vs purrr::pmap() for the following use case. I was expecting the same result, but looks like purrr::map() iterates only once through the list.

# using pmap
pmap(.l = list(x = c(1,6)) ,.f = function(x) {substr("costcopull",x,x-1+5)})
# result 
[[1]]
[1] "costc"

[[2]]
[1] "opull"

# using map
map(.x = list(x = c(1,6)) ,.f = function(x) {substr("costcopull",x,x-1+5)})
$x
[1] "costc"

I was expecting both the results to be the same, as the function has the singular input "x"

But they aren't.


Solution

  • map applies a function to each element of a vector / list.
    In your example, the list you provide as argument has only one element : c(1,6)
    As substr isn't a vectorized function, it will only use the first index of the vector : 1

    x <- c(1,6)
    substr("abcdef",x,1)
    [1] "a"
    

    pmap is especially useful for dataframes (which are lists of vectors) to process columns in parallel row by row:

    df <- data.frame(x = c(1, 3), y = c(2, 4))
    dput(df)
    #> structure(list(x = c(1, 3), y = c(2, 4)), class = "data.frame",...)
    
    pmap(df, \(x,y) paste(x,y))
    #> [[1]]
    #> [1] "1 2"
    #> 
    #> [[2]]
    #> [1] "3 4"
    

    When you give a single element list as argument, pmap browses this single column row by row :

    pmap(.l = list(x = c(1,3)) ,.f = function(x) {x})
    #[[1]]
    #[1] 1
    
    #[[2]]
    #[1] 3 
    

    This is what happens in your example.

    To sum up, as pointed out by @Darren Tsai, inputing directly c(1,6) to map seems to be what you're looking for, you don't need to put this vector into a list.
    In this case, the parallel column processing capability of pmap doesn't seem to be needed.