rdata.tablenon-equi-join

Filtering by different ranges by group in data.table


I have some data in a table and I would like to do a non-equi join (I think is the right terminology) and filter it by different ranges for different groups. In the below example I would like to filter group "a" so that it only returns values between 1 and 20 (inclusive) and group "b" so it only returns values between 80 and 100 (inclusive). My reading suggests that inrange should be the item to use. I get how to use it in the generic case, but I am not sure how to get it to run with different ranges by group. (example code adapted from ?inrange)

create sample data

set.seed(1234)
Y = data.table(a=sample(1:100,100), val=runif(100), group=c(rep("a",50),rep("b",50)))
range = data.table(group=c("a","b"),start = c(1,80), end = c(20,100))

Try to filter

Y[inrange(a, range$start, range$end),,by=group]

This obviously does not work and instead applies those ranges to the entire dataset and throws the error message Ignoring by= because j= is not supplied. I think it is clear to me that this is not working because I haven't created the 'join' between the range table and Y, but I am not seeing how to make the two tables communicate grouping via inrange.

Note: In reality values in a will be posixct datetimes, but for the sake of simplicity I am not using that here.


Solution

  • Perhaps:

    Y[range, K := TRUE, on = .(group, a >= start, a <= end)][!is.na(K),]
    #         a        val  group      K
    #     <int>      <num> <char> <lgcl>
    #  1:     9 0.60189755      a   TRUE
    #  2:     5 0.99874081      a   TRUE
    #  3:    16 0.55512663      a   TRUE
    #  4:     4 0.42944396      a   TRUE
    #  5:    14 0.43101637      a   TRUE
    #  6:     3 0.47880269      a   TRUE
    #  7:     2 0.02220682      a   TRUE
    #  8:     6 0.63891131      a   TRUE
    #  9:     8 0.83470266      a   TRUE
    # 10:    17 0.98304402      a   TRUE
    # 11:    98 0.76785547      b   TRUE
    # 12:    94 0.30766574      b   TRUE
    # 13:    88 0.25814665      b   TRUE
    # 14:    89 0.49954639      b   TRUE
    # 15:    83 0.50892062      b   TRUE
    # 16:    95 0.49443856      b   TRUE
    # 17:    97 0.56695890      b   TRUE
    # 18:    87 0.98970989      b   TRUE
    # 19:    82 0.53190509      b   TRUE
    # 20:   100 0.59662376      b   TRUE
    #         a        val  group      K
    

    There are other ways to do this, but they involve renaming or loss of information. For instance,