rdata.table

How to calculate conditional counts and sums?


I want to do something similar of COUNTIFS and SUMIFS in R using data.table package.

Here is my data.

library(data.table)
treesData<-as.data.table(trees)
bins<-seq(63, length.out = 10, by = 3);
aggtable <- data.table(bin1 = bins[-length(bins)], bin2 = bins[-1])

This is what I want to achieve: Count is countifs on height if it is in the interval and Girth.total is sum on girth.

Output screenshot


Solution

  • An alternative is to use a non-equi join, and leverage .EACHI:

    treesData[aggtable,on=.(Height>=bin1, Height<bin2),.(ct = .N, Girth = sum(Girth)),.EACHI]
    

    Output:

       Height Height    ct Girth
        <num>  <num> <int> <num>
    1:     63     66     3  31.2
    2:     66     69     1  11.0
    3:     69     72     3  33.7
    4:     72     75     4  53.9
    5:     75     78     6  73.3
    6:     78     81     7 104.5
    7:     81     84     4  56.3
    8:     84     87     2  26.2
    9:     87     90     1  20.6
    

    Note that non-equi join's in data.table can be confusing with regards to naming. A more complete option could be:

    aggtable = treesData[aggtable,on=.(Height>=bin1, Height<bin2),.(ct = .N, Girth = sum(Girth)),.EACHI]
    setnames(aggtable, new=c("Bin1", "Bin2", "Count", "Girth.sum"))