For simplicity, in dataframe 1 I have 3 genes (a, b, and c) and their positions in the genome. For instance, gene "a" starts at position 1 (min) and ends at 2 (max). In dataframe 2, I have a mutation's position, which may occur in the a gene (between df1$min and df1$max) and impact it:
df1 = data.frame("gene" = c("a","b","c"), "min" = c(1,3,5), "max"=c(2,4,6))
df2 = data.frame("position" = c(1.5,3.5,5.5),"impact" = c("low","low","high"))
I would like to make a dataframe which shows the mutation position, the gene it is in, and it's impact. Like so:
position gene impact
1.5 a low
3.5 b low
5.5 c high
Thank you.
Here is a base R option
transform(
df2,
gene = with(
df1,
{
d <- outer(position, min, ">=") & outer(position, max, "<=")
c(NA, gene)[1 + rowSums(d * col(d))]
}
)
)
which gives
position impact gene
1 1.5 low a
2 3.5 low b
3 5.5 high c
where c(NA, gene)[1 + rowSums(d * col(d))]
was applied in case no matched gene was found.