rdata-visualizationdata-manipulationtimelinevistime

Data Manipulation/ Imputation for Timeline using R


I am trying to create a timeline using the vistime package in R. The problem I am having is creating rows where data does not exist, to have a continuous timeline.

Manually doing so can be quite tedious and I would like to find a way to automate the process of filling in a default label for the periods where data is absent.

Here is an example of the data and current output:

library(vistime)

  syst <- data.frame(Position = rep(c( "DOWN"), each= 5),
    Name = c("SYS2", "SYS2","SYS4","SYS4","SYS6"),
    start = c("2018-10-16","2018-12-06","2018-10-24","2018-12-05","2018-11-09"),
    end = c("2018-11-26","2018-12-31","2018-11-23","2018-12-31","2018-12-31"),
    color = rep(c('#FF0000'), each=5),
    fontcolor = rep(c('white'), each=5))

vistime(syst, events = "Position", groups = "Name")

actual

Desired output:

syst2 <- data.frame(Position = rep(c( "UP","DOWN"), 5),
        Name = rep(c("SYS2", "SYS2","SYS4","SYS4","SYS6"), each=2),
        start = c("2018-10-01","2018-10-16","2018-11-26","2018-12-06","2018-10-01","2018-10-24","2018-11-23","2018-12-05","2018-10-01","2018-11-09"),
        end = c("2018-10-16","2018-11-26","2018-12-06","2018-12-31","2018-10-24","2018-11-23","2018-12-05","2018-12-31","2018-11-09","2018-12-31"),
        color = rep(c("#008000",'#FF0000'), 5),
        fontcolor = rep(c('white'), 10))


vistime(syst2, events = "Position", groups = "Name")

expected


Solution

  • We may do as follows. First let

    rng <- c("2018-10-01", "2018-12-31")
    

    be a vector of the start and end dates that you consider. Also, I added stringsAsFactors = FALSE to the definition of syst as to avoid issues when adding new dates.

    Then we have

    library(tidyverse)
    syst2 <- syst %>% group_by(Name) %>% 
      do({bind_rows(., data.frame(Position = "UP", Name = .$Name[1], 
                                  start = c(rng[1], .$end),
                                  end = c(.$start, rng[2]), 
                                  color = "#008000", 
                                  fontcolor = "white", 
                                  stringsAsFactors = FALSE))}) %>%
      filter(start != end)
    vistime(syst2, events = "Position", groups = "Name")
    

    So, we group by Name, as for each group we bind the existent rows with a new data frame, where everything is specified as expected, and the only trick is with start and end. Lastly I filter out those rows where start and end dates coincide.

    enter image description here