rdataframetime-seriesmultivariate-time-series

Optimal data structure for time-series analysis


My data is large CPI dataset containing headline CPI as well as various subindexes (i.e. food, non-food, services). Initial data stored in Excel spreadsheet in wide format like this:

name 2002Q1 2002Q2 2002Q3 2002Q4
cpi 100 103.09 104.29 105.42
food 100 102.82 103.71 104.18
non_food 100 101.23 102.01 102.70
services 100 107.47 110.41 114.50

After importing data in dataframe df using read_excel, I pivot it in longer format:

df <- pivot_longer(df,
                    cols = !name,
                    names_to = "date",
                    values_to = "index"
                   )

So, my dataframe looks like this:

  name  date  index
   <chr> <chr> <dbl>
 1 cpi   2002Q1  100 
 2 cpi   2002Q2  103.
 3 cpi   2002Q3  104.
 4 cpi   2002Q4  105.
 5 food  2002Q1  100
 6 food  2002Q2  102.
...

Now I'm trying to convert dataframe into ts object to analyze acf/pacf functions, but if I use df.ts <- ts(df, start = c(2002,1), frequency = 4) then I found out that new mts object not only change names into digital codes, but also treat subindexes (i.e. food) as a continuation of headline index (cpi) in periods 2003Q1, 2003Q2 and so on, which is not correct (food is a different index in 2002Q1-2002Q4 time period). In Stata panel data structure was pretty much the same and I was able to use expressions like acf ln_food if name==food, but it looks like mts object in R works a little bit differently.

What is a better way of data layout in mts object to effectively analyse acf/pacf functions, take differences, fit arima models and calculate correlations? Should it be organized like this or something:

date    cpi   food non-food services
2002Q1  100   100   100     100
2002Q2  103.  102.  101.    107.  
2002Q3  104.  103.  102.    110.
2002Q4  105.  104.  102.    114.

And also how can I retain name of the series or id of a series in mts objects?


Solution

  • The following converts df (the data frame shown at the top of the question and shown in Note at end) to a "ts" object and then takes the acf. The as.ts line could be omitted in which case it would return a "zoo" time series object.

    library(tibble)
    library(zoo)
    
    tser <- df %>%
      column_to_rownames("name") %>%
      t %>%
      as.data.frame %>%
      rownames_to_column("date") %>%
      read.zoo(FUN = as.yearqtr) %>%
      as.ts
    
    tser
    ##            cpi   food non_food services
    ## 2002 Q1 100.00 100.00   100.00   100.00
    ## 2002 Q2 103.09 102.82   101.23   107.47
    ## 2002 Q3 104.29 103.71   102.01   110.41
    ## 2002 Q4 105.42 104.18   102.70   114.50
    
    acf(tser)
    

    Alternately if long is the data frame at the end of the question then this gives the same result.

    library(tidyr)
    library(zoo)
    
    tser2 <- long %>%
      pivot_wider(values_from = index) %>%
      read.zoo(FUN = as.yearqtr) %>%
      as.ts
    

    Note

    df <- structure(list(name = c("cpi", "food", "non_food", "services"
    ), `2002Q1` = c(100L, 100L, 100L, 100L), `2002Q2` = c(103.09, 
    102.82, 101.23, 107.47), `2002Q3` = c(104.29, 103.71, 102.01, 
    110.41), `2002Q4` = c(105.42, 104.18, 102.7, 114.5)), class = "data.frame", row.names = c(NA, -4L))