Let's say I have the following dataframe
df1=read.table(text="ID POSITION S1 S2
1 1 10 10
1 2 20 0
1 3 10 0
1 4 20 0
1 5 10 50
2 1 10 0
2 2 20 10
2 3 20 10
2 4 20 10
2 5 20 10", header=TRUE)
ID and POSITION are ordered by ID then POSITION
Then another dataframe with coordinates
df2=read.table(text="ID POSITION_START POSITION_END
1 1 3
1 4 5
2 1 5", header=TRUE)
How can I get the means of S1 and S2 for those specific intervals based on ID and position from df1?
Final dataframe would be
df3=read.table(text="ID POSITION_START POSITION_END S1 S2
1 1 3 13.33 3.3
1 4 5 15 25
2 1 5 18 8", header=TRUE)
Where S1 and S2 would be means for the intervals (first row is ID=1 rows 1 to 3) I think it might be important that positions are not always equal to row number
I've tried using the iRanges library to no avail
data.table
, you can trysetDT(df1)
setDT(df2)
df1[
df2,
.(ID, POSITION_START, POSITION_END, S1, S2),
on = .(ID, POSITION >= POSITION_START, POSITION <= POSITION_END)
][, lapply(.SD, mean), by = ID:POSITION_END]
and you will obtain
ID POSITION_START POSITION_END S1 S2
<int> <int> <int> <num> <num>
1: 1 1 3 13.33333 3.333333
2: 1 4 5 15.00000 25.000000
3: 2 1 5 18.00000 8.000000
aggregate
+ subset
+ merge
, e.g.,aggregate(
. ~ ID + POSITION_START + POSITION_END,
subset(
merge(df1, df2, all.x = TRUE),
POSITION >= POSITION_START & POSITION <= POSITION_END,
select = -POSITION
), mean
)
which gives
ID POSITION_START POSITION_END S1 S2
1 1 1 3 13.33333 3.333333
2 2 1 5 18.00000 8.000000
3 1 4 5 15.00000 25.000000