I'm doing sequence analysis with Traminer on R and I would like to take into account only the order of different spells over time. For instance, I would like that the sequence A-B-A would be considered the same as A-B-B-B-A when plotting the most frequent sequences or when using the Index plot. Is there an option to deal with this type of analysis without changing the data format?
There are two strategies to produce plots focusing on the ordering of the state.
You can also produce a typology focusing on state ordering using specific distance measures.
Let's take an example. First build the sequence object:
library(TraMineR)
#>
#> TraMineR development version 2.3-4 (Built: 2022-11-29)
#> Website: http://traminer.unige.ch
#> Please type 'citation("TraMineR")' for citation information.
data(biofam)
## Create the sequence object
bfstates <- c("Parent", "Left", "Married", "Left/Married", "Child", "Left/Child", "Left/Married/Child", "Divorced")
bf.shortlab <- c("P","L","M","LM","C","LC", "LMC", "D")
bf.seq <- seqdef(biofam[,10:25], states=bf.shortlab, labels=bfstates)
#> [>] state coding:
#> [alphabet] [label] [long label]
#> 1 0 P Parent
#> 2 1 L Left
#> 3 2 M Married
#> 4 3 LM Left/Married
#> 5 4 C Child
#> 6 5 LC Left/Child
#> 7 6 LMC Left/Married/Child
#> 8 7 D Divorced
#> [>] 2000 sequences in the data set
#> [>] min/max sequence length: 16/16
Created on 2023-02-21 with reprex v2.0.2
You can remove timing information using the seqdss
function:
bf.dss <- seqdss(bf.seq)
And then plot it (any plots for sequences will work):
seqfplot(bf.dss)
seqIplot(bf.dss, sortv="from.start")
Parallel coordinates plot aims to focus on the order of states only:
seqpcplot(bf.dss)
The results might look messy (depending on your data). You can highlight the most common ordering of state by showing in color pattern that account in total for 50% of cases
seqpcplot(bf.seq , filter = list(type = "function",
value = "cumfreq",
level = 0.5))
See the following reference for more.
Bürgin, R. and G. Ritschard (2014), A decorated parallel coordinate plot for categorical longitudinal data, The American Statistician 68(2), 98-103. https://doi.org/10.1080/00031305.2014.887591
If you would like to build a typology focusing on state sequencing, you need to choose the distance measure accordingly. See the guideline section of the following article for more details.
Studer, M. and Ritschard, G. (2016), What matters in differences between life trajectories: a comparative review of sequence dissimilarity measures. J. R. Stat. Soc. A, 179: 481-511. https://doi.org/10.1111/rssa.12125