rdata.table

Transforming a data.table in R and assigning Scores Based on Start and End Positions


I have a data.table df in R with four columns: id, start, end, score. The start and end range from 1 to 10 and id might have duplicates. I would like to transform this data.table into a new matrix B where each row corresponds to an id, and has 10 columns corresponding to each position from 1 to 10. If a position is within the start and end in df for a particular id, the corresponding cell in B should be the score, otherwise it should be 0.

eg:

library(data.table)
df <- data.table(id = c(1, 1, 2), start = c(1, 5, 1), end = c(3, 7, 2), score = c(10, 20, 30))
df
  id start end score
1  1     1   3    10
2  1     5   7    20
3  2     1   2    30

Expected output:

id  1   2   3   4   5   6   7   8   9   10
1   10  10  10  0   20  20  20  0   0   0
2   30  30  0   0   0   0   0   0   0   0

Solution

  • First extending the data's existing score by row, then adding the missing rows and finally transpose.

    library(data.table)
    
    Range <- 1:10
    
    df[,.(id, Seq = seq(start, end, 1), score), by = seq_along(id)][
      , transpose(.(sapply(Range, \(x) 
                      max(fifelse(x == Seq, score, 0))))), by = id]
          id    V1    V2    V3    V4    V5    V6    V7    V8    V9   V10
       <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
    1:     1    10    10    10     0    20    20    20     0     0     0
    2:     2    30    30     0     0     0     0     0     0     0     0