Subsequences functions give interesting results with the seqefsub function. I work on sequences composed of geographical locations. Is there a way to know whether the subsequence listed is a complete subsequence.
I provide an example.
library(TraMineR)
id = c(rep(1,5), rep(2,3), rep(3,3), rep(4,3), rep(5,2), rep(6,2), rep(7,3), rep(8,3))
begin = c(1963, 1969, 1969, 1974, 2004, 1971, 1976, 1984, 1996, 1998, 2011, 1997, 2008, 2011, 1967, 1971, 1972, 1985, 1971, 1980, 1986, 1974, 2000, 2002)
end = c(1969, 1969, 1974, 2004, 2012, 1976, 1984, 2012, 1998, 2011, 2012, 2008, 2011, 2012, 1971, 2012, 1985 ,2012 ,1980 ,1986 ,2012 ,2000 ,2002 ,2012)
status = c(1, 5, 6, 5, 1, 1, 5, 1, 1, 3, 8, 1, 3, 1, 1, 5, 1, 8, 1, 5, 1, 1, 8, 1)
df = data.frame(id,begin,end,status)
df.seq1 = seqformat(df, from = "SPELL", to="STS", process = FALSE)
df.seq2 <- seqdef(df.seq1, informat='STS')
df.seq3 <- seqecreate(df.seq2, tevent = "transition")
fsubseq <-seqefsub(df.seq3, min.support = 1)
There are 8 sequences where status corresponds to different geographical locations. Time unit is a year. The function fsubseq lists all possible subsequences.
Subsequence Support Count
1 (*) 0.875 7
2 (*)-(*>1) 0.875 7
3 (*>1) 0.875 7
4 (*)-(*>1)-(1>5) 0.375 3
5 (*)-(1>5) 0.375 3
6 (*>1)-(1>5) 0.375 3
7 (1>5) 0.375 3
8 (5>1) 0.375 3
9 (*)-(*>1)-(1>3) 0.250 2
10 (*)-(*>1)-(1>5)-(5>1) 0.250 2
11 (*)-(*>1)-(1>8) 0.250 2
12 (*)-(*>1)-(5>1) 0.250 2
13 (*)-(1>3) 0.250 2
14 (*)-(1>5)-(5>1) 0.250 2
15 (*)-(1>8) 0.250 2
16 (*)-(5>1) 0.250 2
17 (*>1)-(1>3) 0.250 2
18 (*>1)-(1>5)-(5>1) 0.250 2
19 (*>1)-(1>8) 0.250 2
20 (*>1)-(5>1) 0.250 2
21 (1>3) 0.250 2
22 (1>5)-(5>1) 0.250 2
23 (1>8) 0.250 2
what i call the "complete subsequence" correspond to the subsequences that encompass all successive states for one individual. In this examples, there are seven: 1/6/5/1, 1/5/1,1/3/8, 1/3/1, 1/5, 1/8, 1/8/1. The "complete subsequence" 1/5/1 corresponds to line 10. It is difficult to spot in the list the "complete subsequence". So my question is to know whether there is a way to filter from the list the complete subsequence.
From what I understand, what you call "complete subsequences" are the sequences of distinct successive states.
The distinct successive states are obtained from the state sequences with seqdss
, and the frequencies of the sequences with seqtab
. So, we get frequencies of what you call "complete subsequences" with:
seqtab(seqdss(df.seq2))
# Freq Percent
# 1/1-5/1-1/1 2 25
# 1/1-3/1-1/1 1 12
# 1/1-3/1-8/1 1 12
# 1/1-5/1 1 12
# 1/1-6/1-5/1-1/1 1 12
# 1/1-8/1 1 12
# 1/1-8/1-1/1 1 12