I have a DF of logical vectors as follows:
DF <- data.frame(c(T,T,F), c(T,F,T), c(F,T,F))
I want to find row-column pairs under the condition that the combination has a TRUE value.
So, DF[1,2]
represents a possible pair, but DF[2,2]
does not.
Once in pair, the row and the column are excluded to make new pairs.
Depending on the data-set, there with be different pairing possibilities. It may also be impossible to find a pair for all the rows or columns.
My question is: What kind of algorithm/library can I use to maximize the quantity of pairs?
In the example given, the pairing solution would be this one:
DF[3,2]
DF[2,3]
DF[1,1]
This is the maximum matching problem of a bipartite graph formed by the rows and columns connected according to the matrix DF
. There are a few packages you can use for this problem. One option is the RcppHungarian
package, with HungarianSolver
.
DF <- data.frame(c(T,T,F), c(T,F,T), c(F,T,F))
RcppHungarian::HungarianSolver(!DF)$pairs
#> [,1] [,2]
#> [1,] 1 1
#> [2,] 2 3
#> [3,] 3 2