r for-loop matrix indexing vectorization

How to remove and speed up a for loop over matrix columns by vectorisation?

Intro Consider a tournament with four players and the following schedule:

Rd 1: 1-4, 2-3.
Rd 2: 4-3, 1-2.
Rd 3: 2-4, ---.

The usual way to display tournament results is the cross table, for example:

  N P|A B C D|Tot.
  ---+-------+---
  1 A|x 0 . 1| 1
  2 B|0 x 2 2| 4    
  3 C|. 0 x 2| 2
  4 D|1 0 0 x| 1
  ---+-------+---
 Tot.|1 0 2 5| 8

For- and against-scores can be determined directly as the sum of rows and columns as shown above.

An alternative display is the format of the game file. For each player and round, it shows the opponent, the colour or away/home and the result:

      Rd1 Rd2 Rd3 Tot.
  1 A 4w1 2w0 ---  1
  2 B 3w2 1b0 4w2  4
  3 C 2b0 4b2 ---  2
  4 D 1b1 3w0 2b0  1
 Tot.              8

The above structure can be represented in two tables opponents and results, and possibly also colours if relevant. Determining the sum of the scores by player is straightforward.

For some applications, the sum of loss points must also be determined. If the sum of win and loss score has a fixed value (e.g. c) then the against score can be determined directly, as noted by @shtz --> @Friede. But this is often not the case, even in chess where 0 - 0 is a valid result.

A possible solution is to transpose results, i.e. swap a player's result with the opponent's score. Then we determine the sum of the rows, which are the loss scores.

Swapping can be achieved by a for loop as in the function t.sparse below.

Of course, the question arises: what does an R-style solution look like?

Function

# Reverse wins and losses in matrix m
# by swapping opponents within columns.
t.sparse <- function(m, opp) {
  for (k in seq_len(ncol(m))) {
    m[, k] <- m[, k][opp[, k]]
  }
  return(m)
}

Data

# Indexed by player, round.
# First round: Player 1 meets player 4, player 2 meets player 3.
opponents <- matrix(c(4, 2, NA,
                      3, 1, 4,
                      2, 4, NA,
                      1, 3, 2
                     ), nrow = 4, byrow = TRUE
                   )
# Player 1 draws player 4, player 2 beats player 3 in first round
results   <- matrix(c(1, 0, NA, 
                      2, 0, 2,
                      0, 2, NA,
                      1, 0, 0
                     ), nrow = 4, byrow = TRUE
                    )

# In the results matrix, outcomes between two players are reversed.
t.sparse(results, opponents)
# Gives:
#      [,1] [,2] [,3]
# [1,]    1    0   NA
# [2,]    0    0    0
# [3,]    2    0   NA
# [4,]    1    2    2

The matrices are modest in size: 1300 x 10 (max), 125 x 12, 40 x 39

#Githubs copilot suggests

tv.sparse <- function(m, opp) {
  m <- m[cbind(seq_along(m), opp)]
  return(m)
}

This is obviously not equivalent to the original function.

Q: How to do this in a proper way?

Benchmark

tv.sparse <-  function(m, opp) {
  res <- matrix(NA, nrow = nrow(m), ncol = ncol(m))
  col_ids <- matrix(rep(seq_len(ncol(m)), each = nrow(m)), nrow = nrow(m)) 
  valid <- !is.na(opp)
  res[valid] <- m[cbind(opp[valid], col_ids[valid])]
  return(res)
}

library(bench)
bench::mark(
  "forloop" = ttt <-  t.sparse(results, opponents),
  "vect1"   = vv1 <- tv.sparse(results, opponents),
  check=TRUE
)

## A tibble: 2 × 13
#  expression      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#  <bch:expr> <bch:tm> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#1 forloop       4.6µs  5.4µs   179329.        0B     89.7  9995     5     55.7ms
#2 vect1         8.5µs  9.6µs   101872.    59.8KB     91.8  9991     9     98.1ms
## ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>

identical(ttt, vv1)
# [1] TRUE

Solution

For each player you're looking up the opponent in each round from the opponents matrix and the finding the corresponding result in the results matrix. This can be vectorised by using opponents to select the desired row and the base col() function to represent the appropriate column:

get_output <- function(opponents, results) {
    indices <- matrix(c(opponents, col(opponents)), ncol = 2)
    matrix(
        results[indices],
        ncol = ncol(opponents)
    )
}

Each row of the indices matrix represents the desired row and column, e.g. row 1 is the result for the first opponent of player one (player four) in the first round:

#       [,1] [,2]
#  [1,]    4    1

Subsetting results by these indices returns the desired output:

identical(
    get_output(opponents, results),
    t.sparse(results, opponents)
)
# [1] TRUE

Benchmark

Here is a benchmark with 10 to 100,000 opponents and 10 to 110 rounds:

out <- bench::press(
    n_players = 10^c(1:5),
    n_rounds = seq(from = 10, to = 110, by = 25),
    {
        opponents <- sample(n_rounds, size = n_rounds * n_players, replace = TRUE) |>
            matrix(ncol = n_players)
        results <- sample(c(0, 1, 2), n_rounds * n_players, replace = TRUE) |>
            matrix(ncol = n_players)
        bench::mark(
            t.sparse(results, opponents),
            get_output(opponents, results)
        )
    }
)

This is a quicker than your approach. Note that times are on a log scale, so for example with 100k opponents and 110 rounds the median time for this approach is 143ms compared with 997ms using a loop:

Full benchmark output:

# A tibble: 50 × 9
   expression                     n_players n_rounds      min   median `itr/sec` mem_alloc `gc/sec` n_itr
   <bch:expr>                         <dbl>    <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int>
 1 t.sparse(results, opponents)          10       10     17µs   20.4µs  45816.        848B    18.3   9996
 2 get_output(opponents, results)        10       10   9.43µs  10.29µs  89950.      4.19KB     9.00  9999
 3 t.sparse(results, opponents)         100       10 145.46µs 187.75µs   5149.      7.86KB    17.0   2425
 4 get_output(opponents, results)       100       10  24.11µs  26.21µs  34482.     39.34KB     3.45  9999
 5 t.sparse(results, opponents)        1000       10 789.04µs  829.1µs    893.     78.17KB    28.5    376
 6 get_output(opponents, results)      1000       10  78.91µs  81.71µs  11930.    390.91KB    16.7   5726
 7 t.sparse(results, opponents)       10000       10   8.04ms   8.13ms    122.     781.3KB    56.3     39
 8 get_output(opponents, results)     10000       10 810.83µs 851.79µs   1166.      3.81MB    18.9    555
 9 t.sparse(results, opponents)      100000       10  85.62ms  86.26ms     11.6     7.63MB    36.6      6
10 get_output(opponents, results)    100000       10   7.97ms    9.4ms    100.     38.15MB    21.6     51
11 t.sparse(results, opponents)          10       35  12.22µs  13.24µs  71713.     16.69KB    21.5   9997
12 get_output(opponents, results)        10       35   6.37µs   6.95µs 139477.     13.95KB    13.9   9999
13 t.sparse(results, opponents)         100       35 109.97µs 115.56µs   8427.    166.45KB    26.7   3471
14 get_output(opponents, results)       100       35  30.49µs  32.12µs  30153.       137KB    18.1   9994
15 t.sparse(results, opponents)        1000       35   1.13ms   1.17ms    848.      1.62MB    28.0    333
16 get_output(opponents, results)      1000       35 300.09µs 313.72µs   3168.      1.33MB    18.9   1511
17 t.sparse(results, opponents)       10000       35  11.31ms  11.67ms     85.8    16.25MB   132.      13
18 get_output(opponents, results)     10000       35   3.21ms   3.29ms    303.     13.35MB    36.1    126
19 t.sparse(results, opponents)      100000       35 504.39ms 504.39ms      1.98  162.51MB     1.98     1
20 get_output(opponents, results)    100000       35  40.63ms  42.72ms     20.7   133.51MB    26.3     11
21 t.sparse(results, opponents)          10       60  14.93µs   16.2µs  59672.      26.3KB    11.9   9998
22 get_output(opponents, results)        10       60   8.26µs   9.44µs 100450.     23.72KB    10.0   9999
23 t.sparse(results, opponents)         100       60 133.79µs 142.07µs   6813.    262.55KB    11.9   2865
24 get_output(opponents, results)       100       60  48.86µs  51.82µs  18339.    234.66KB    19.9   6449
25 t.sparse(results, opponents)        1000       60   1.32ms   1.39ms    715.      2.56MB     6.99   307
26 get_output(opponents, results)      1000       60 506.89µs 527.29µs   1707.      2.29MB    17.0    804
27 t.sparse(results, opponents)       10000       60  13.42ms  13.66ms     72.9    25.64MB    10.4     28
28 get_output(opponents, results)     10000       60   5.51ms   5.72ms    172.     22.89MB    34.4     65
29 t.sparse(results, opponents)      100000       60 175.32ms  290.8ms      3.44  256.35MB     3.44     2
30 get_output(opponents, results)    100000       60  97.38ms 124.69ms      5.50  228.88MB     7.33     3
31 t.sparse(results, opponents)          10       85  16.89µs  18.87µs  43977.     36.22KB     0    10000
32 get_output(opponents, results)        10       85  10.13µs  11.52µs  70925.     33.48KB     7.09  9999
33 t.sparse(results, opponents)         100       85 153.67µs 163.72µs   5712.    361.77KB     4.29  1333
34 get_output(opponents, results)       100       85  68.38µs  72.85µs   9163.    332.31KB     4.36  4208
35 t.sparse(results, opponents)        1000       85   1.56ms   1.66ms    596.      3.53MB     4.73   252
36 get_output(opponents, results)      1000       85 719.87µs 747.88µs   1330.      3.24MB     8.76   607
37 t.sparse(results, opponents)       10000       85  15.39ms   15.8ms     62.5    35.32MB     4.81    26
38 get_output(opponents, results)     10000       85   7.96ms   8.09ms    111.     32.42MB     9.47    47
39 t.sparse(results, opponents)      100000       85 194.21ms 211.06ms      4.78  353.24MB     7.97     3
40 get_output(opponents, results)    100000       85 167.47ms 462.31ms      2.16  324.25MB     3.24     2
41 t.sparse(results, opponents)          10      110  18.68µs  20.75µs  46004.     45.83KB     4.60  9999
42 get_output(opponents, results)        10      110  11.91µs  13.23µs  68648.     43.25KB     0    10000
43 t.sparse(results, opponents)         100      110 171.37µs 186.09µs   5215.    457.86KB     4.84  2156
44 get_output(opponents, results)       100      110  87.14µs  92.02µs   6690.    429.97KB     4.22  3173
45 t.sparse(results, opponents)        1000      110    1.9ms   1.96ms    510.      4.47MB     8.10    63
46 get_output(opponents, results)      1000      110 934.73µs   1.93ms    649.       4.2MB     2.09   311
47 t.sparse(results, opponents)       10000      110   17.4ms  17.89ms     55.4    44.71MB     5.03    22
48 get_output(opponents, results)     10000      110  10.43ms  10.68ms     78.4    41.96MB     4.48    35
49 t.sparse(results, opponents)      100000      110 997.75ms 997.75ms      1.00  447.08MB     1.00     1
50 get_output(opponents, results)    100000      110 135.68ms  143.2ms      6.71  419.62MB     6.71     4