rdataframeggplot2traminer

TramineR sequence plot with ggplot2


I'm new to the TramineR package and would like to use ggplot to create a state distribution plot. The plot below was created with the TramineR package, but how can I extract the data and plot it with ggplot? i would like to change the axis and colours as well?

enter image description here

Sample code:

dev.off()
seqdplot(df_new.seq[1:10,], border=0,
         axes=T, yaxis=T, xaxis=T, ylab="",
         cex.legend=0.5, ncol=6, legend.prop=.11)

Sample data:

structure(list(`04:00` = structure(c(19L, 19L, 19L), .Label = c("PC", 
"SL", "EA", "WR", "ST", "DI", "FP", "FO", "LA", "IR", "HO", "CH", 
"CA", "LE", "CO", "TV", "RA", "TR", "OT", "*", "%"), class = "factor"), 
    `04:10` = structure(c(19L, 19L, 19L), .Label = c("PC", "SL", 
    "EA", "WR", "ST", "DI", "FP", "FO", "LA", "IR", "HO", "CH", 
    "CA", "LE", "CO", "TV", "RA", "TR", "OT", "*", "%"), class = "factor"), 
    `04:20` = structure(c(19L, 19L, 19L), .Label = c("PC", "SL", 
    "EA", "WR", "ST", "DI", "FP", "FO", "LA", "IR", "HO", "CH", 
    "CA", "LE", "CO", "TV", "RA", "TR", "OT", "*", "%"), class = "factor"), 
    `04:30` = structure(c(19L, 19L, 19L), .Label = c("PC", "SL", 
    "EA", "WR", "ST", "DI", "FP", "FO", "LA", "IR", "HO", "CH", 
    "CA", "LE", "CO", "TV", "RA", "TR", "OT", "*", "%"), class = "factor"), 
    `04:40` = structure(c(19L, 19L, 19L), .Label = c("PC", "SL", 
    "EA", "WR", "ST", "DI", "FP", "FO", "LA", "IR", "HO", "CH", 
    "CA", "LE", "CO", "TV", "RA", "TR", "OT", "*", "%"), class = "factor"), 
    `04:50` = structure(c(19L, 19L, 19L), .Label = c("PC", "SL", 
    "EA", "WR", "ST", "DI", "FP", "FO", "LA", "IR", "HO", "CH", 
    "CA", "LE", "CO", "TV", "RA", "TR", "OT", "*", "%"), class = "factor"), 
    `05:00` = structure(c(19L, 19L, 19L), .Label = c("PC", "SL", 
    "EA", "WR", "ST", "DI", "FP", "FO", "LA", "IR", "HO", "CH", 
    "CA", "LE", "CO", "TV", "RA", "TR", "OT", "*", "%"), class = "factor"), 
    `05:10` = structure(c(19L, 19L, 19L), .Label = c("PC", "SL", 
    "EA", "WR", "ST", "DI", "FP", "FO", "LA", "IR", "HO", "CH", 
    "CA", "LE", "CO", "TV", "RA", "TR", "OT", "*", "%"), class = "factor"), 
    `05:20` = structure(c(19L, 19L, 19L), .Label = c("PC", "SL", 
    "EA", "WR", "ST", "DI", "FP", "FO", "LA", "IR", "HO", "CH", 
    "CA", "LE", "CO", "TV", "RA", "TR", "OT", "*", "%"), class = "factor"), 
    `05:30` = structure(c(19L, 19L, 19L), .Label = c("PC", "SL", 
    "EA", "WR", "ST", "DI", "FP", "FO", "LA", "IR", "HO", "CH", 
    "CA", "LE", "CO", "TV", "RA", "TR", "OT", "*", "%"), class = "factor")), row.names = c(NA, 
3L), start = 1, missing = NA, void = "%", nr = "*", alphabet = c("PC", 
"SL", "EA", "WR", "ST", "DI", "FP", "FO", "LA", "IR", "HO", "CH", 
"CA", "LE", "CO", "TV", "RA", "TR", "OT"), class = c("stslist", 
"data.frame"), labels = c("Personal care", "Sleep", "Eating", 
"Work", "Study", "Dishwash", "Food preparation", "Household upkeep", 
"Laundry", "Ironing", "Housework", "Childcare", "Care for adults", 
"Leisure", "Computing", "TV", "Radio and music", "Travel", "Other"
), cpal = c("#FFB3B5", "#F8B8A2", "#EDBE91", "#DDC485", "#CBCA82", 
"#B5D087", "#9DD594", "#84D8A6", "#6ED9B9", "#61D9CD", "#66D7DF", 
"#7AD3ED", "#96CCF8", "#B3C5FD", "#CEBDFD", "#E3B6F7", "#F3B1EC", 
"#FDAFDC", "#FFB0CA"), missing.color = "darkgrey", xtstep = 19, tick.last = FALSE, Version = "2.2-2")

Solution

  • The online help page of seqplot (of which seqdplot is an alias for type="d") states

    A State distribution plot (type="d") represents the sequence of the cross-sectional state frequencies by position (time point) computed by the seqstatd function and rendered with the plot.stslist.statd method. Such plots are also known as chronograms.

    So you get the data used by seqdplot with function seqstatd. Actually, the distributions are in the attribute Frequencies.

    Your sample data contains only three sequences of length 10 with a single spell in state 'OT'. I stored it in s.spl

    s.spl
    #   Sequence                     
    # 1 OT-OT-OT-OT-OT-OT-OT-OT-OT-OT
    # 2 OT-OT-OT-OT-OT-OT-OT-OT-OT-OT
    # 3 OT-OT-OT-OT-OT-OT-OT-OT-OT-OT
    

    The distributions by position are

    sd <- seqstatd(s.spl)
    sd$Frequencies
    #    04:00 04:10 04:20 04:30 04:40 04:50 05:00 05:10 05:20 05:30
    # PC     0     0     0     0     0     0     0     0     0     0
    # SL     0     0     0     0     0     0     0     0     0     0
    # EA     0     0     0     0     0     0     0     0     0     0
    # WR     0     0     0     0     0     0     0     0     0     0
    # ST     0     0     0     0     0     0     0     0     0     0
    # DI     0     0     0     0     0     0     0     0     0     0
    # FP     0     0     0     0     0     0     0     0     0     0
    # FO     0     0     0     0     0     0     0     0     0     0
    # LA     0     0     0     0     0     0     0     0     0     0
    # IR     0     0     0     0     0     0     0     0     0     0
    # HO     0     0     0     0     0     0     0     0     0     0
    # CH     0     0     0     0     0     0     0     0     0     0
    # CA     0     0     0     0     0     0     0     0     0     0
    # LE     0     0     0     0     0     0     0     0     0     0
    # CO     0     0     0     0     0     0     0     0     0     0
    # TV     0     0     0     0     0     0     0     0     0     0
    # RA     0     0     0     0     0     0     0     0     0     0
    # TR     0     0     0     0     0     0     0     0     0     0
    # OT     1     1     1     1     1     1     1     1     1     1
    

    Good luck if want to rewrite TraMineR's plotting facilities with ggplot