rmatrixinterpolation

Matrix linear interpolation


I have a matrix with individuals as rows and time points as columns. The value in the matrix is the probability of an event occurring for a subject at each time point.

set.seed(123)
prob_mat <- matrix(round(runif(15), 2), 5, 3,
                   dimnames = list(paste0('id', 1:5), c(1.2, 2.5, 3.1)))

#      1.2  2.5  3.1
# id1 0.29 0.05 0.96
# id2 0.79 0.53 0.45
# id3 0.41 0.89 0.68
# id4 0.88 0.55 0.57
# id5 0.94 0.46 0.10

I also have a time vector named time_vec.

time_vec <- c(1.7, 2.9, 4)

I want to estimate the probabilities for each subject at the time points recorded in time_vec using linear interpolation. For example, the time point 1.7 is between 1.2 and 2.5, with a distance of 0.5 from 1.2 and 0.8 from 2.5, so the interpolated probabilities should be

(prob_mat[, '1.2'] * 0.8 + prob_mat[, '2.5'] * 0.5) / 1.3

#       id1       id2       id3       id4       id5 
# 0.1976923 0.6900000 0.5946154 0.7530769 0.7553846 

Note that the time point 4 is outside the interval [1.2, 3.1]. We use the values at the closest time, i.e. 3.1, as its estimates. The expected output is as follows:

          1.7       2.9    4
id1 0.1976923 0.6566667 0.96
id2 0.6900000 0.4766667 0.45
id3 0.5946154 0.7500000 0.68
id4 0.7530769 0.5633333 0.57
id5 0.7553846 0.2200000 0.10

I have tried apply() with approx() by rows, but the efficiency is low for a large matrix.


Solution

  • We can use approx() to determine the positions of time_vec within prob_mat, and then use the modulo operator (%%) to obtain the decimal part of the position. This value is exactly the weight required for linear interpolation.

    tt <- as.numeric(colnames(prob_mat))
    pos <- approx(x = tt, y = seq_along(tt), xout = time_vec, rule = 2)$y
    w <- pos %% 1
    t(t(prob_mat[, floor(pos)]) * (1-w) + t(prob_mat[, ceiling(pos)]) * w)
    
    #           1.2       2.5  3.1
    # id1 0.1976923 0.6566667 0.96
    # id2 0.6900000 0.4766667 0.45
    # id3 0.5946154 0.7500000 0.68
    # id4 0.7530769 0.5633333 0.57
    # id5 0.7553846 0.2200000 0.10
    

    Note: rule = 2 is set so that when the estimated time point is outside the interval, the value at the closest data extreme is used.