rsparse-matrixpredictvisitor-statistic

Creating sparse matrix and then running predictive analysis on the data


Hello R experts out here,

I am a Stata programmer trying to learn R. I have a data framewhere each row ID has values against it as df1:

df1 <- data.frame(name=c("John", "Mary", "Joe", "Tim", "Bob", "Pat"),
                  v1=c(14,2,3,4,14,1),
                  v2=c(21,6,19,31,16,5),
                  v3=c(32,10,22,33,27,30),
                  v4=c(42,17,45,39,34,35),
                  v5=c(98,35,66,0,78,99),
                  v6=c(117,49,0,0, 89,186))

The values in the columns for each visitor ID range from 1 to 1000. These are basically the days each visitor ID visited the doctor in a period of 1000 days. Some of the patients stop visiting after offset of the symptoms and some patients continue the medication and pay routine visits as prescribed by the doctor. Some patients start visiting again after a long time if the disease relapses.

I want to create a sparse matrix of all the IDs that visited the doctor from 1 to 1000 days. Could you please suggest how to create create a sparse matrix. Its pretty simple and straight forward in Stata but I can't figure out a way in R.

The end results should be in the form:

name    1   2   3   4   5   6   10  14  16  17  19  21
John                                1               1
Mary        1               1   1           1       
Joe         1                               1

After creating the sparse matrix, I have to predict, when would be the patient's next visit to the doctor. I was planning on creating a sparse matrix, then create a calculated variable of the difference between the last two consecutive visits and then use logistic regression on it. Could there be any more KPIs that can be generated to make a robust analysis with only the given information? Could someone please suggest if the idea is correct or if there is a better way to approach it.

Thanks in advance.


Solution

  • The question regarding how to create a sparse matrix, for which you provided a coded example can be easily answered. I don't think you need to install that package because it is in the "recommended" category of packages that get shipped with any distribution.

    library(Matrix)
    help(pac=Matrix)
    M <-  Matrix(data.matrix( df1[-1]), sparse=TRUE)  # remove character column first
    6 x 6 sparse Matrix of class "dgCMatrix"
         v1 v2 v3 v4 v5  v6
    [1,] 14 21 32 42 98 117
    [2,]  2  6 10 17 35  49
    [3,]  3 19 22 45 66   .
    [4,]  4 31 33 39  .   .
    [5,] 14 16 27 34 78  89
    [6,]  1  5 30 35 99 186
    

    For the revised question:

    # first create index vectors
    xix <- c( row(data.matrix(df1[-1]))[!is.na(df1[-1])])
    xjy <- c(df1[-1][!is.na(df1[-1])])
    

    Then supply non-NA values to index arguments and enough 1's to populate the index positions:

     M <- spMatrix(6, 186, i = xix, j=xjy, x=rep(1,length(c( row(data.matrix(df1[-1]))[!is.na(df1[-1])])))) 
    > str(M)
    Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
      ..@ i       : int [1:33] 0 1 2 3 4 5 0 1 2 3 ...
      ..@ j       : int [1:33] 13 1 2 3 13 0 20 5 18 30 ...
      ..@ Dim     : int [1:2] 6 186
      ..@ Dimnames:List of 2
      .. ..$ : NULL
      .. ..$ : NULL
      ..@ x       : num [1:33] 1 1 1 1 1 1 1 1 1 1 ...
      ..@ factors : list()
    > M[1:6, 1:25]  # enough output to show success
    6 x 25 sparse Matrix of class "dgTMatrix"
    
    [1,] . . . . . . . . . . . . . 1 . . . . . . 1 . . . .
    [2,] . 1 . . . 1 . . . 1 . . . . . . 1 . . . . . . . .
    [3,] . . 1 . . . . . . . . . . . . . . . 1 . . 1 . . .
    [4,] . . . 1 . . . . . . . . . . . . . . . . . . . . .
    [5,] . . . . . . . . . . . . . 1 . 1 . . . . . . . . .
    [6,] 1 . . . 1 . . . . . . . . . . . . . . . . . . . .
    >