rdata-conversionmultivariate-time-series

Convert a time series dataset with multiple date columns into a time series with a unique date column or into a zoo object


I have a large dataset made of multiple irregular timeseries with a specific date column for each series. I want to convert this dataset into a dataframe with a unique date column or into a zoo object.

I tried read_xls(), read.zoo(). I tried to reshape with pivot_longer(). I searched on the web but I have not found any solution yet.

date1 Var1 date2 Var2 date3 Var3
2023-01-13 100.1 2023-01-11 99.7 2022-11-24 102.3
2023-01-16 104.5 2023-01-12 NA 2022-11-25 99.9
2023-01-17 101.6 2023-01-13 99.9 2022-11-28 99.3
2023-01-18 101.8 2023-01-16 99.1 2022-11-29 NA
2023-01-19 NA 2023-01-17 99.5 2022-11-30 NA

Solution

  • Using the data shown reproducibly in the Note at the end, assume that what is wanted is a zoo object with a separate column for each non-date column.

    First create a grouping vector g which looks like c("Var1", "Var1", "Var2", "Var2", "Var3", "Var3") and then convert DF to a list and split it by g giving s. Finally convert each component of s to a zoo object and merge them using cbind. (If a data frame is wanted use fortify.zoo on the result.)

    library(zoo)
    
    g <- rep(names(DF)[sapply(DF, is.numeric)], each = 2)
    s <- split(as.list(DF), g)
    do.call("cbind", lapply(s, function(x) read.zoo(as.data.frame(x))))
    

    giving:

                Var1 Var2  Var3
    2022-11-24    NA   NA 102.3
    2022-11-25    NA   NA  99.9
    2022-11-28    NA   NA  99.3
    2022-11-29    NA   NA    NA
    2022-11-30    NA   NA    NA
    2023-01-11    NA 99.7    NA
    2023-01-12    NA   NA    NA
    2023-01-13 100.1 99.9    NA
    2023-01-16 104.5 99.1    NA
    2023-01-17 101.6 99.5    NA
    2023-01-18 101.8   NA    NA
    2023-01-19    NA   NA    NA
    

    This could be represented as a pipeline like this:

    g <- rep(names(DF)[sapply(DF, is.numeric)], each = 2)
    DF |>
      as.list() |>
      split(g) |>
      lapply(function(x) read.zoo(as.data.frame(x))) |>
      do.call(what = "cbind")
    

    or

    DF |>
      as.list() |>
      (\(x) split(x, rep(names(x)[sapply(x, is.numeric)], each = 2)))() |>
      lapply(\(x) read.zoo(as.data.frame(x))) |>
      do.call(what = "cbind")
    

    Note

    Lines <- "date1 Var1 date2 Var2 date3 Var3
    2023-01-13 100.1 2023-01-11 99.7 2022-11-24 102.3
    2023-01-16 104.5 2023-01-12 NA 2022-11-25 99.9
    2023-01-17 101.6 2023-01-13 99.9 2022-11-28 99.3
    2023-01-18 101.8 2023-01-16 99.1 2022-11-29 NA
    2023-01-19 NA 2023-01-17 99.5 2022-11-30 NA"
    DF <- read.table(text = Lines, header = TRUE)