rmatch

Is there no "multiple match vector" function in R?


I was trying to find a "readily available" function to do the following:

> my_array = c(5,9,11,10,6,5,9,13)
> my_array
[1]  5  9 11 10  6  5  9 13
> my_test <- c(5, 6)
> new_match_function(my_test, my_array)
[1] 1 5 6
# or instead, maybe:
# [[1]]
# [1] 1 6
# [[2]]
# [1] 5

For my purposes, %in% is close enough, since it will return:

> my_array %in% my_test
[1]  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE

and I could just do:

> seq(length(my_array))[my_array %in% my_test]
[1] 1 5 6

But it just seems that something like match should provide this capability: a means to return multiple elements from the match.


If I were to create a package simply to provide this solution, it would not be strongly adopted (for good reason... this tiny use case is not worth installing a package).

Is there a solution already available? If not, where is a good place for me to add this? As I showed, it's easy enough to solve without a new function, but for match to not allow for multiple matches seems crazy. I'd ideally like to either:

  1. Find out that I'm wrong and there is a direct function to accomplish this, or
  2. Be able to alter match itself so that it can return multiple occurrences.

But my impression (right or wrong) has been that any adjustments to the base code are more trouble than they are worth.


Solution

  • For simple cases, which(my_array %in% my_test) or lapply(my_test, function(x) which(my_array==x)) works fine, but those are not the most efficient.

    For the first case (just knowing which are matches, not seeing to which elements they correspond), using the fastmatch-package may help, it has the %fin% (fast-in) function, that keeps a hash table of your array so that subsequent lookups are more efficient.

    For the second case, there is findMatches in the S4Vectors-bioconductor-package. (https://bioconductor.org/packages/release/bioc/html/S4Vectors.html)

    Note that this function doesn't return a list, but a hits-object. To get a list, you need the buioconductor IRanges-package as well (and use as.list). (https://bioconductor.org/packages/release/bioc/html/IRanges.html)