Starting with these two data frames:
data <- data.frame("Run_ID" = c(1,2,3), "Sample" = c("A", "B", "C"), "Value" = c(1,2,3))
metadata <- data.frame("Run_ID" = c(1,3), "Sample" = c("A","C"))
I would like to subset data
so that it only includes the values from the pairs of Run_ID
+ Sample
which also are present in metadata
. The output should contain the same columns as data
.
Expected Output:
Run_ID Sample Value
1 A 1
3 C 3
Based on the documentation, it seems like semi_join()
should be the solution, but I cannot figure out join based on the two variables.
>semi_join(data, metadata, by = c("Run_ID", "Sample"))
[1] Run_ID Sample Value
<0 rows> (or 0-length row.names)
Any suggestions are greatly appreciated!
Your code is ok but the input metadata
is not in a friendly format but I guess this is what you're after:
semi_join(
data,
metadata %>% separate_rows(Sample, sep = ','),
by = c('Run_ID', 'Sample')
)
# Run_ID Sample Value
# 1 1 A 1
# 2 3 C 3