I am trying to use the pmap for the first time, but struggle with assigning the arguments. Here is my test dataset:
overall <- data.table(dependant = rep(c("SPS", "DEPENDANT", "EMP"), 3),
exposure = rnorm(9, 0, 1),
age = c(1,2,3,1,2,3,3,1,2),
gender = rep(c("F", "F", "M"), 3))
i was originally doing something like this:
# spouse
SPS <- overall[dependant == "SPS", .(exposure = sum(exposure)), by = c("dependant", "age", "gender")]
sumExposureSPS <- sum(SPS$exposure)
SPSnormalized <- SPS[, exposure := exposure/sumExposureSPS][, .(age, gender, exposure)]
# dependant
DEPENDENT <- overall[dependant == "DEPENDENT", .(exposure = sum(exposure)), by = c("dependant", "age", "gender")]
sumExposureDEPENDENT <- sum(DEPENDENT$exposure)
DEPENDENTnormalized <- DEPENDENT[, exposure := exposure/sumExposureDEPENDENT][, .(age, gender, exposure)]
# employee
EMP <- overall[dependant == "EMP", .(exposure = sum(exposure)), by = c("dependant", "age", "gender")]
sumExposureEMP <- sum(EMP$exposure)
EMPnormalized <- EMP[, exposure := exposure/sumExposureEMP][, .(age, gender, exposure)]
but this is very repetitive, practically only the names differ and the action executed is always the same. Therefore i have written a function:
calculateSubset <- function(overall,
dependantCode){
subset <- overall[dependant == dependantCode, .(exposure = sum(exposure)), by = c("dependant", "age", "gender")]
sumExposureSubset <- sum(subset$exposure)
subsetNormalized <- subset[, exposure := exposure/sumExposureSubset][, .(age, gender, exposure)]
return(subset)
}
so i have reduced this to:
SPSnormalized <- calculateSubset(overall = overall,
dependantCode = "SPS")
DEPENDENTnormalized <- calculateSubset(overall = overall,
dependantCode = "DEPENDENT")
EMPnormalized <- calculateSubset(overall = overall,
dependantCode = "EMP")
however, this is still repetitive. I have seem some examples of people using pmap
to get rid of the repetitive code completely.
How do i pass the arguments int the pmap such that i get the desired outputs at the end?
To make it simple, swap the arguments in the function calculateSubset
. By default, the map
family iterates along a list to be passed as the first argument of the function.
calculateSubset <- function( dependantCode, df = overall){
subset <- df[dependant == dependantCode, .(exposure = sum(exposure)), by = c("dependant", "age", "gender")]
sumExposureSubset <- sum(subset$exposure)
subsetNormalized <- subset[, exposure := exposure/sumExposureSubset][, .(age, gender, exposure)]
return(subset)
}
c("SPS", "DEPENDANT", "EMP") %>% map(calculateSubset)
# Note that the above map() call is equivalent but more concise than this pmap() call: list(c("SPS", "DEPENDENT", "EMP")) %>% pmap(calculateSubset)
[[1]]
dependant age gender exposure
1: SPS 1 F 0.522064
2: SPS 3 F 0.477936
[[2]]
dependant age gender exposure
1: DEPENDANT 2 F -0.3019417
2: DEPENDANT 1 F 1.3019417
[[3]]
dependant age gender exposure
1: EMP 3 M 0.8140009
2: EMP 2 M 0.1859991