rtraminer

R TraMinerR - why is seqformat throwing error?


I'm very new to TraMineR, but am trying to examine sequences of modes that patients used for clinical visits over time.

My data looks like this after setting it up to convert it to from SPELL to STS format. You can see here that the begin and end variables are integer variables.

> df %>% head(20)
# A tibble: 20 x 6
      id index begin   end status   status_1
   <int> <int> <int> <int> <fct>       <int>
 1     1     1     1     1 Video           3
 2     1     2     2     2 Video           3
 3     2     1     1     1 Phone           2
 4     2     2     2     2 Phone           2
 5     2     3     3     3 Phone           2
 6     3     1     1     1 Video           3
 7     4     1     1     1 Video           3
 8     5     1     1     1 Phone           2
 9     6     1     1     1 Video           3
10     6     2     2     2 Video           3
11     6     3     3     3 Video           3
12     6     4     4     4 Video           3
13     6     5     5     5 Video           3
14     7     1     1     1 Phone           2
15     7     2     2     2 Phone           2
16     8     1     1     1 Video           3
17     9     1     1     1 Phone           2
18    10     1     1     1 Phone           2
19    10     2     2     2 Phone           2
20    10     3     3     3 InPerson        1

With a quick look using skim() from skimr, we can also see the variable types and that there is no missing data.

> df %>% skimr::skim()
-- Data Summary ------------------------
                           Values    
Name                       Piped data
Number of rows             4530      
Number of columns          6         
_______________________              
Column type frequency:               
  factor                   1         
  numeric                  5         
________________________             
Group variables            None      

-- Variable type: factor ----------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 1 x 6
  skim_variable n_missing complete_rate ordered n_unique top_counts                    
* <chr>             <int>         <dbl> <lgl>      <int> <chr>                         
1 status                0             1 FALSE          3 Pho: 2496, Vid: 1864, InP: 170

-- Variable type: numeric ---------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 5 x 11
  skim_variable n_missing complete_rate    mean      sd    p0   p25   p50   p75  p100 hist 
* <chr>             <int>         <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 id                    0             1 1203.   702.        1   584  1207  1817  2426 ▇▇▇▇▇
2 index                 0             1    1.86   1.24      1     1     1     2    11 ▇▁▁▁▁
3 begin                 0             1    1.86   1.24      1     1     1     2    11 ▇▁▁▁▁
4 end                   0             1    1.86   1.24      1     1     1     2    11 ▇▁▁▁▁
5 status_1              0             1    2.37   0.556     1     2     2     3     3 ▁▁▇▁▆

However, when I attempt to use seqformat to convert my data from SPELL to state sequences using this code:

df_sts <- seqformat(df, id = "id", begin = "begin", end = "end", status = "status_1", from = "SPELL", to = "STS", process = FALSE)

I get this error:

Error in is.wholenumber(c(endcolumn, begincolumn)) : 
  'list' object cannot be coerced to type 'integer'

I'm trying to follow the steps outlined in the TraMineR User Guide, but I'm really not sure where this error is coming from since both begin and end variables are integers... Can someone help me understand what the issue is here and how to resolve the "error"?


Solution

  • TraMineR does not seem to play well with tibbles. Declaring the data as data frame should do the trick.

    df2 <- as.data.frame(df)
    

    I checked the TraMineR code and found the cause for the error. Tibbles and data frames behave differently when extracting a single variable. When doing this with a data.frame we obtain a vector, in the case of a tibble the extracted column is still of class tibble.

    library(tidyverse)
    
    df <- tribble(
      ~id, ~index, ~begin, ~end, ~status, ~status_1, 
      1, 1, 1, 1, "Video", 3,
      1, 2, 2, 2, "Video", 3,
      2, 1, 1, 1, "Phone", 2,
      2, 2, 2, 2, "Phone", 2,
      2, 3, 3, 3, "Phone", 2,
      3, 1, 1, 1, "Video", 3,
      4, 1, 1, 1, "Video", 3,
      5, 1, 1, 1, "Phone", 2,
      6, 1, 1, 1, "Video", 3,
      6, 2, 2, 2, "Video", 3,
      6, 3, 3, 3, "Video", 3,
      6, 4, 4, 4, "Video", 3,
      6, 5, 5, 5, "Video", 3) |> 
      mutate(status = factor(status))
    
    # Subsetting a tibble
    c(df[,3],df[,4])
    $begin
    [1] 1 2 1 2 3 1 1 1 1 2 3 4 5
    
    $end
    [1] 1 2 1 2 3 1 1 1 1 2 3 4 5
    
    # Subsetting a data.frame
    c(df2[,3],df2[,4])
    [1] 1 2 1 2 3 1 1 1 1 2 3 4 5 1 2 1 2 3 1 1 1 1 2 3 4 5
    
    is.wholenumber <- function(x){as.integer(x) == x}
    
    # tibble --> error 
    all(is.wholenumber(c(df[,3],df[,4])))
    
    Error in is.wholenumber(c(df[, 3], df[, 4])) : 
      'list' object cannot be coerced to type 'integer'
    
    # data.frame -> works as expected
    all(is.wholenumber(c(df2[,3],df2[,4])))
    [1] TRUE