rsequencetraminersequence-analysis

Traminer R for sequence analysis: how to account for state order besides spell lenght?


I'm doing sequence analysis with Traminer on R and I would like to take into account only the order of different spells over time. For instance, I would like that the sequence A-B-A would be considered the same as A-B-B-B-A when plotting the most frequent sequences or when using the Index plot. Is there an option to deal with this type of analysis without changing the data format?


Solution

  • There are two strategies to produce plots focusing on the ordering of the state.

    You can also produce a typology focusing on state ordering using specific distance measures.

    Example

    Let's take an example. First build the sequence object:

    library(TraMineR)
    #> 
    #> TraMineR development version 2.3-4 (Built: 2022-11-29)
    #> Website: http://traminer.unige.ch
    #> Please type 'citation("TraMineR")' for citation information.
    data(biofam)
    ## Create the sequence object
    bfstates <- c("Parent", "Left", "Married", "Left/Married",  "Child", "Left/Child", "Left/Married/Child", "Divorced")
    bf.shortlab <- c("P","L","M","LM","C","LC", "LMC", "D")
    bf.seq <- seqdef(biofam[,10:25], states=bf.shortlab, labels=bfstates)
    #>  [>] state coding:
    #>        [alphabet]  [label]  [long label]
    #>      1  0           P        Parent
    #>      2  1           L        Left
    #>      3  2           M        Married
    #>      4  3           LM       Left/Married
    #>      5  4           C        Child
    #>      6  5           LC       Left/Child
    #>      7  6           LMC      Left/Married/Child
    #>      8  7           D        Divorced
    #>  [>] 2000 sequences in the data set
    #>  [>] min/max sequence length: 16/16
    

    Created on 2023-02-21 with reprex v2.0.2

    Remove any timing information

    You can remove timing information using the seqdss function:

    bf.dss <- seqdss(bf.seq)
    

    And then plot it (any plots for sequences will work):

    seqfplot(bf.dss)
    

    seqIplot(bf.dss, sortv="from.start")
    

    Parallel Coordinate plots

    Parallel coordinates plot aims to focus on the order of states only:

    seqpcplot(bf.dss)
    

    The results might look messy (depending on your data). You can highlight the most common ordering of state by showing in color pattern that account in total for 50% of cases

    seqpcplot(bf.seq , filter = list(type = "function",
                                     value = "cumfreq",
                                     level = 0.5))
    

    See the following reference for more.

    Bürgin, R. and G. Ritschard (2014), A decorated parallel coordinate plot for categorical longitudinal data, The American Statistician 68(2), 98-103. https://doi.org/10.1080/00031305.2014.887591

    Typology

    If you would like to build a typology focusing on state sequencing, you need to choose the distance measure accordingly. See the guideline section of the following article for more details.

    Studer, M. and Ritschard, G. (2016), What matters in differences between life trajectories: a comparative review of sequence dissimilarity measures. J. R. Stat. Soc. A, 179: 481-511. https://doi.org/10.1111/rssa.12125