rpurrrpmap

using pmap function in R


I am trying to use the pmap for the first time, but struggle with assigning the arguments. Here is my test dataset:

  overall <- data.table(dependant = rep(c("SPS", "DEPENDANT", "EMP"), 3),
                        exposure = rnorm(9, 0, 1), 
                        age = c(1,2,3,1,2,3,3,1,2), 
                        gender = rep(c("F", "F", "M"), 3))

i was originally doing something like this:

  # spouse
  SPS <- overall[dependant == "SPS", .(exposure = sum(exposure)), by = c("dependant", "age", "gender")]
  sumExposureSPS <- sum(SPS$exposure)
  SPSnormalized <- SPS[, exposure := exposure/sumExposureSPS][, .(age, gender, exposure)]

  
  # dependant  
  DEPENDENT <- overall[dependant == "DEPENDENT", .(exposure = sum(exposure)), by = c("dependant", "age", "gender")]
  sumExposureDEPENDENT <- sum(DEPENDENT$exposure)
  DEPENDENTnormalized <- DEPENDENT[, exposure := exposure/sumExposureDEPENDENT][, .(age, gender, exposure)]


  # employee
  EMP <- overall[dependant == "EMP", .(exposure = sum(exposure)), by = c("dependant", "age", "gender")]
  sumExposureEMP <- sum(EMP$exposure)
  EMPnormalized <- EMP[, exposure := exposure/sumExposureEMP][, .(age, gender, exposure)]

but this is very repetitive, practically only the names differ and the action executed is always the same. Therefore i have written a function:

  calculateSubset <- function(overall, 
                              dependantCode){
    
    subset <- overall[dependant == dependantCode, .(exposure = sum(exposure)), by = c("dependant", "age", "gender")]
    sumExposureSubset <- sum(subset$exposure)
    subsetNormalized <- subset[, exposure := exposure/sumExposureSubset][, .(age, gender, exposure)]
    
    return(subset)
  }

so i have reduced this to:

  SPSnormalized <- calculateSubset(overall = overall, 
                                   dependantCode = "SPS")

  DEPENDENTnormalized <- calculateSubset(overall = overall, 
                                   dependantCode = "DEPENDENT")

  EMPnormalized <- calculateSubset(overall = overall, 
                                   dependantCode = "EMP")

however, this is still repetitive. I have seem some examples of people using pmap to get rid of the repetitive code completely.

How do i pass the arguments int the pmap such that i get the desired outputs at the end?


Solution

  • To make it simple, swap the arguments in the function calculateSubset. By default, the map family iterates along a list to be passed as the first argument of the function.

    calculateSubset <- function( dependantCode, df = overall){
      
      subset <- df[dependant == dependantCode, .(exposure = sum(exposure)), by = c("dependant", "age", "gender")]
      sumExposureSubset <- sum(subset$exposure)
      subsetNormalized <- subset[, exposure := exposure/sumExposureSubset][, .(age, gender, exposure)]
      
      return(subset)
    }
    
    c("SPS", "DEPENDANT", "EMP") %>% map(calculateSubset)
    # Note that the above map() call is equivalent but more concise than this pmap() call: list(c("SPS", "DEPENDENT", "EMP")) %>% pmap(calculateSubset)
    [[1]]
       dependant age gender exposure
    1:       SPS   1      F 0.522064
    2:       SPS   3      F 0.477936
    
    [[2]]
       dependant age gender   exposure
    1: DEPENDANT   2      F -0.3019417
    2: DEPENDANT   1      F  1.3019417
    
    [[3]]
       dependant age gender  exposure
    1:       EMP   3      M 0.8140009
    2:       EMP   2      M 0.1859991