If M
is a numeric matrix, I can order its rows with respect to the lexicographical order by running lexsort(M)
where
lexorder <- function(M) {
do.call(order, lapply(seq_len(ncol(M)), function(i) M[, i]))
}
lexsort <- function(M) {
M[lexorder(M), ]
}
But I'm only interested in getting the greater row (the last one of the ordered matrix). Can we avoid ordering everything to more efficiently extract the last row?
You couls write a recursive function that works faster:
lex_max <- function(M,i=1){
d <- dim(M)
if(d[1] == 1 | i > d[2])M[1,]
else lex_max(M[max(M[,i]) == M[,i],,drop = FALSE], i+1)
}
a <- matrix(sample(500, 1e5, TRUE), 10)
The timings:
Large number of columns:
microbenchmark::microbenchmark(lex_max(a),
OP=lexsort(a)[nrow(a),],lexmaxrow(a), check = 'equal')
Unit: microseconds
expr min lq mean median uq max neval
lex_max(a) 45.7 57.95 80.442 65.55 87.45 306.4 100
OP 15819.5 19437.25 25579.100 22246.55 29002.80 68948.8 100
lexmaxrow(a) 16393.9 18739.65 25210.846 22022.40 29098.75 47731.1 100
a <- matrix(sample(500, 1e5, TRUE), 500)
Unit: microseconds
expr min lq mean median uq max neval
lex_max(a) 5.7 9.90 93.152 12.85 19.70 7124.0 100
OP 577.0 629.75 907.524 699.20 1017.70 7771.4 100
lexmaxrow(a) 470.5 521.05 875.526 619.45 908.85 10049.1 100
Large number of rows
a <- matrix(sample(500, 1e5, TRUE), ncol=10)
Unit: microseconds
expr min lq mean median uq max neval
lex_max(a) 60.2 97.5 137.462 120.0 164.40 650.5 100
OP 594.0 775.9 1359.959 966.8 1251.35 14719.9 100
lexmaxrow(a) 475.1 624.1 1013.927 769.5 936.60 11775.1 100
In all the instances the lex_max
function performs >~10x faster
If you need the position, you could simply do:
which_lexmax <- function(M,i=1, b = seq_len(nrow(M))){
d <- dim(M)
if(d[1] == 1 | i > d[2])b[1]
else lex_max(M[mx <- max(M[,i]) == M[,i],,drop = FALSE], i+1, b[mx])
}
which_lexmax(a)