rsortinglexicographic

Get leading element for the lexicographical order


If M is a numeric matrix, I can order its rows with respect to the lexicographical order by running lexsort(M) where

lexorder <- function(M) {
  do.call(order, lapply(seq_len(ncol(M)), function(i) M[, i]))
}
lexsort <- function(M) {
  M[lexorder(M), ]
}

But I'm only interested in getting the greater row (the last one of the ordered matrix). Can we avoid ordering everything to more efficiently extract the last row?


Solution

  • You couls write a recursive function that works faster:

    lex_max <- function(M,i=1){
      d <- dim(M)
      if(d[1] == 1 | i > d[2])M[1,]
      else lex_max(M[max(M[,i]) == M[,i],,drop = FALSE], i+1)
    }
    
    a <- matrix(sample(500, 1e5, TRUE), 10)
    

    The timings:

    Large number of columns:

    microbenchmark::microbenchmark(lex_max(a), 
            OP=lexsort(a)[nrow(a),],lexmaxrow(a), check = 'equal')
    
    Unit: microseconds
             expr     min       lq      mean   median       uq     max neval
       lex_max(a)    45.7    57.95    80.442    65.55    87.45   306.4   100
               OP 15819.5 19437.25 25579.100 22246.55 29002.80 68948.8   100
     lexmaxrow(a) 16393.9 18739.65 25210.846 22022.40 29098.75 47731.1   100
    

    a <- matrix(sample(500, 1e5, TRUE), 500)
    
    Unit: microseconds
             expr   min     lq    mean median      uq     max neval
       lex_max(a)   5.7   9.90  93.152  12.85   19.70  7124.0   100
               OP 577.0 629.75 907.524 699.20 1017.70  7771.4   100
     lexmaxrow(a) 470.5 521.05 875.526 619.45  908.85 10049.1   100
    

    Large number of rows

    a <- matrix(sample(500, 1e5, TRUE), ncol=10)
    
    Unit: microseconds
             expr   min    lq     mean median      uq     max neval
       lex_max(a)  60.2  97.5  137.462  120.0  164.40   650.5   100
               OP 594.0 775.9 1359.959  966.8 1251.35 14719.9   100
     lexmaxrow(a) 475.1 624.1 1013.927  769.5  936.60 11775.1   100
    

    In all the instances the lex_max function performs >~10x faster

    Edit:

    If you need the position, you could simply do:

    which_lexmax <- function(M,i=1, b = seq_len(nrow(M))){
      d <- dim(M)
      if(d[1] == 1 | i > d[2])b[1]
      else lex_max(M[mx <- max(M[,i]) == M[,i],,drop = FALSE], i+1, b[mx])
    }
     which_lexmax(a)