rdataframefunctiondplyraggregate

Run a custom function on a dataframe by group


Custom function to loop over a group in a dataframe.

Here is some sample data:

set.seed(42)
tm <- as.numeric(c("1", "2", "3", "3", "2", "1", "2", "3", "1", "1"))
d <- as.numeric(sample(0:2, size = 10, replace = TRUE))
t <- as.numeric(sample(0:2, size = 10, replace = TRUE))
h <- as.numeric(sample(0:2, size = 10, replace = TRUE))

df <- as.data.frame(cbind(tm, d, t, h))
df$p <- rowSums(df[2:4])

I created a custom function to calculate the value w:

calc <- function(x) {
  data <- x
  w <- (1.27*sum(data$d) + 1.62*sum(data$t) + 2.10*sum(data$h)) / sum(data$p)
  w
  }

When I run the function on the entire data set, I get the following answer:

calc(df)
[1]1.664474

Ideally, I want to return results that are grouped by tm, e.g.:

tm     w
1    result of calc
2    result of calc
3    result of calc

So far I have tried using aggregate with my function, but I get the following error:

aggregate(df, by = list(tm), FUN = calc)
Error in data$d : $ operator is invalid for atomic vectors

I feel like I have stared at this too long and there is an obvious answer.


Solution

  • Using dplyr

    library(dplyr)
    df %>% 
       group_by(tm) %>%
       do(data.frame(val=calc(.)))
    #  tm      val
    #1  1 1.665882
    #2  2 1.504545
    #3  3 1.838000
    

    If we change the function slightly to include multiple arguments, this could also work with summarise

     calc1 <- function(d1, t1, h1, p1){
          (1.27*sum(d1) + 1.62*sum(t1) + 2.10*sum(h1) )/sum(p1) }
     df %>%
         group_by(tm) %>% 
         summarise(val=calc1(d, t, h, p))
     #  tm      val
     #1  1 1.665882
     #2  2 1.504545
     #3  3 1.838000