[SOLVED] Getting mean of multiple rows based on interval dataframe in R

Getting mean of multiple rows based on interval dataframe in R

Let's say I have the following dataframe

df1=read.table(text="ID POSITION S1 S2 
1   1 10 10
1   2 20 0
1   3 10 0
1   4 20 0
1   5 10 50
2   1 10 0
2   2 20 10
2   3 20 10
2   4 20 10
2   5 20 10", header=TRUE)

ID and POSITION are ordered by ID then POSITION

Then another dataframe with coordinates

df2=read.table(text="ID POSITION_START POSITION_END
1  1              3
1  4              5
2  1              5", header=TRUE)

How can I get the means of S1 and S2 for those specific intervals based on ID and position from df1?

Final dataframe would be

df3=read.table(text="ID POSITION_START POSITION_END S1 S2
1  1              3            13.33  3.3
1  4              5            15  25
2  1              5            18  8", header=TRUE)

Where S1 and S2 would be means for the intervals (first row is ID=1 rows 1 to 3) I think it might be important that positions are not always equal to row number

I've tried using the iRanges library to no avail

Solution

With data.table, you can try

setDT(df1)
setDT(df2)

df1[
  df2,
  .(ID, POSITION_START, POSITION_END, S1, S2),
  on = .(ID, POSITION >= POSITION_START, POSITION <= POSITION_END)
][, lapply(.SD, mean), by = ID:POSITION_END]

and you will obtain

      ID POSITION_START POSITION_END       S1        S2
   <int>          <int>        <int>    <num>     <num>
1:     1              1            3 13.33333  3.333333
2:     1              4            5 15.00000 25.000000
3:     2              1            5 18.00000  8.000000

With base R, you can use aggregate + subset + merge, e.g.,

aggregate(
  . ~ ID + POSITION_START + POSITION_END,
  subset(
    merge(df1, df2, all.x = TRUE),
    POSITION >= POSITION_START & POSITION <= POSITION_END,
    select = -POSITION
  ), mean
)

which gives

  ID POSITION_START POSITION_END       S1        S2
1  1              1            3 13.33333  3.333333
2  2              1            5 18.00000  8.000000
3  1              4            5 15.00000 25.000000