Multi state analysis R using mstate

I am trying to construct a dataframe for multistate analysis in R using the mstate package using the following code:

tmat <- transMat(x = list( c(2, 10), c(3, 10), c(4, 10), c(5, 10), c(6, 10), c(7, 10), c(8, 10), c(9, 10), c(10), c()),
                 names = c("start", "aki_1", "rec_1", "aki_2", "rec_2", "aki_3", "rec_3", "aki_4", "rec_4", "death"))

tmat
from    start aki_1 rec_1 aki_2 rec_2 aki_3 rec_3 aki_4 rec_4 death
  start    NA     1    NA    NA    NA    NA    NA    NA    NA     2
  aki_1    NA    NA     3    NA    NA    NA    NA    NA    NA     4
  rec_1    NA    NA    NA     5    NA    NA    NA    NA    NA     6
  aki_2    NA    NA    NA    NA     7    NA    NA    NA    NA     8
  rec_2    NA    NA    NA    NA    NA     9    NA    NA    NA    10
  aki_3    NA    NA    NA    NA    NA    NA    11    NA    NA    12
  rec_3    NA    NA    NA    NA    NA    NA    NA    13    NA    14
  aki_4    NA    NA    NA    NA    NA    NA    NA    NA    15    16
  rec_4    NA    NA    NA    NA    NA    NA    NA    NA    NA    17
  death    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA

dlong <- msprep(time = c(NA, "aki_1_time",  "rec_1_time", "aki_2_time",  "rec_2_time", 
                         "aki_3_time",  "rec_3_time", "aki_4_time",  "rec_4_time", "death_time"),
                status = c(NA, "aki_1_status",  "rec_1_status", "aki_2_status",  "rec_2_status", 
                           "aki_3_status",  "rec_3_status", "aki_4_status",  "rec_4_status", "death_status"),
                data = d, id = "subject", trans = tmat)

However, msprep keeps returning: Error in time[, -startings] : incorrect number of dimensions. I checked and all variables are present in the dataset, correctly spelled and there are no missing values. Also, I believe the transition matrix is specified correctly.

I thought that it could have to do with the starting state "start", for which I filled in NA for both time and status, but this is the way it should be done according the Rdocumentation.

The dataset looks like this:

    subject aki_1_status aki_1_time rec_1_status rec_1_time aki_2_status aki_2_time rec_2_status rec_2_time aki_3_status aki_3_time rec_3_status rec_3_time
1 1            0       90.2            0       90.2            0       90.2            0       90.2            0       90.2            0       90.2
2 2            0       90.2            0       90.2            0       90.2            0       90.2            0       90.2            0       90.2
3 4            1        6.1            0       90.2            0       90.2            0       90.2            0       90.2            0       90.2
4 5            1        2.1            1       10.1            0       90.2            0       90.2            0       90.2            0       90.2
5 6            1        3.1            1       11.1            1       31.1            1       47.1            0       90.2            0       90.2
6 8            1        1.1            0       90.2            0       90.2            0       90.2            0       90.2            0       90.2
  aki_4_status aki_4_time rec_4_status rec_4_time death_status death_time
1            0       90.2            0       90.2            0       90.2
2            0       90.2            0       90.2            0       90.2
3            0       90.2            0       90.2            1       11.2
4            0       90.2            0       90.2            0       90.2
5            0       90.2            0       90.2            0       90.2
6            0       90.2            0       90.2            1        2.2

Does anybody have a solution for this?

Solution

This could be a bug or a problem with your input format. I don't understand the structure of the input data, but I traced the error message and added a line to the function that seems to prevent it from happening while still giving you an output that looks like it matches your input data. So, if you're sure that the input format is correct and the output I got is correct then its likely a bug in the package. Otherwise you'll need to revisit the documentation to make sure you're specifying the data correctly.

If you look at the code for mstate:::msprepEngine it assumes the time argument is a matrix. However in the final iteration when there is only one row, time becomes a vector (representing the last row of the matrix).

I can prevent the error by adding a line to the msprepEngine function, changing time back into a matrix, just before the call to Recall.

so the last two lines of msprepEngine become:

if (!is.matrix(time)) 
    time <- matrix(time, nrow = 1)
Recall(time = time[, -startings], status = status[, -startings], 
    id = id, starttime = newtime, startstate = newstate, 
    trans = trans[-startings, -startings], originalStates = originalStates[-startings], 
    longmat = longmat)

Then the function runs and I get:

> dlong
An object of class 'msdata'

Data:
   subject from to trans Tstart Tstop time status
1        1    1  2     1    0.0  90.2 90.2      0
2        1    1 10     2    0.0  90.2 90.2      0
3        2    1  2     1    0.0  90.2 90.2      0
4        2    1 10     2    0.0  90.2 90.2      0
5        4    1  2     1    0.0   6.1  6.1      1
6        4    1 10     2    0.0   6.1  6.1      0
7        4    2  3     3    6.1  11.2  5.1      0
8        4    2 10     4    6.1  11.2  5.1      1
9        5    1  2     1    0.0   2.1  2.1      1
10       5    1 10     2    0.0   2.1  2.1      0
11       5    2  3     3    2.1  10.1  8.0      1
12       5    2 10     4    2.1  10.1  8.0      0
13       5    3  4     5   10.1  90.2 80.1      0
14       5    3 10     6   10.1  90.2 80.1      0
15       6    1  2     1    0.0   3.1  3.1      1
16       6    1 10     2    0.0   3.1  3.1      0
17       6    2  3     3    3.1  11.1  8.0      1
18       6    2 10     4    3.1  11.1  8.0      0
19       6    3  4     5   11.1  31.1 20.0      1
20       6    3 10     6   11.1  31.1 20.0      0
21       8    1  2     1    0.0   1.1  1.1      1
22       8    1 10     2    0.0   1.1  1.1      0
23       8    2  3     3    1.1   2.2  1.1      0
24       8    2 10     4    1.1   2.2  1.1      1

however I have no idea if this is the correct output! The transitions with status==1 look like they match the transitions in your sample data frame but I don't understand the rest of the input or output formats (it looks like there are nonsensical 'censored' transitions in there, but they could be OK, I don't know). You could check it or contact the package authors.