rdataframedata.tableaggregater-faq

Aggregate / summarize multiple variables per group (e.g. sum, mean)


From a data frame, is there a easy way to aggregate (sum, mean, max etc) multiple variables simultaneously?

Below are some sample data:

library(lubridate)
days = 365*2
date = seq(as.Date("2000-01-01"), length = days, by = "day")
year = year(date)
month = month(date)
x1 = cumsum(rnorm(days, 0.05)) 
x2 = cumsum(rnorm(days, 0.05))
df1 = data.frame(date, year, month, x1, x2)

I would like to simultaneously aggregate the x1 and x2 variables from the df2 data frame by year and month. The following code aggregates the x1 variable, but is it also possible to simultaneously aggregate the x2 variable?

### aggregate variables by year month
df2=aggregate(x1 ~ year+month, data=df1, sum, na.rm=TRUE)
head(df2)

Solution

  • Where is this year() function from?

    You could also use the reshape2 package for this task:

    require(reshape2)
    df_melt <- melt(df1, id = c("date", "year", "month"))
    dcast(df_melt, year + month ~ variable, sum)
    #  year month         x1           x2
    1  2000     1  -80.83405 -224.9540159
    2  2000     2 -223.76331 -288.2418017
    3  2000     3 -188.83930 -481.5601913
    4  2000     4 -197.47797 -473.7137420
    5  2000     5 -259.07928 -372.4563522