r dataframe time-series multivariate-time-series

Optimal data structure for time-series analysis

My data is large CPI dataset containing headline CPI as well as various subindexes (i.e. food, non-food, services). Initial data stored in Excel spreadsheet in wide format like this:

name	2002Q1	2002Q2	2002Q3	2002Q4
cpi	100	103.09	104.29	105.42
food	100	102.82	103.71	104.18
non_food	100	101.23	102.01	102.70
services	100	107.47	110.41	114.50

After importing data in dataframe df using read_excel, I pivot it in longer format:

df <- pivot_longer(df,
                    cols = !name,
                    names_to = "date",
                    values_to = "index"
                   )

So, my dataframe looks like this:

  name  date  index
   <chr> <chr> <dbl>
 1 cpi   2002Q1  100 
 2 cpi   2002Q2  103.
 3 cpi   2002Q3  104.
 4 cpi   2002Q4  105.
 5 food  2002Q1  100
 6 food  2002Q2  102.
...

Now I'm trying to convert dataframe into ts object to analyze acf/pacf functions, but if I use df.ts <- ts(df, start = c(2002,1), frequency = 4) then I found out that new mts object not only change names into digital codes, but also treat subindexes (i.e. food) as a continuation of headline index (cpi) in periods 2003Q1, 2003Q2 and so on, which is not correct (food is a different index in 2002Q1-2002Q4 time period). In Stata panel data structure was pretty much the same and I was able to use expressions like acf ln_food if name==food, but it looks like mts object in R works a little bit differently.

What is a better way of data layout in mts object to effectively analyse acf/pacf functions, take differences, fit arima models and calculate correlations? Should it be organized like this or something:

date    cpi   food non-food services
2002Q1  100   100   100     100
2002Q2  103.  102.  101.    107.  
2002Q3  104.  103.  102.    110.
2002Q4  105.  104.  102.    114.

And also how can I retain name of the series or id of a series in mts objects?

Solution

The following converts df (the data frame shown at the top of the question and shown in Note at end) to a "ts" object and then takes the acf. The as.ts line could be omitted in which case it would return a "zoo" time series object.

library(tibble)
library(zoo)

tser <- df %>%
  column_to_rownames("name") %>%
  t %>%
  as.data.frame %>%
  rownames_to_column("date") %>%
  read.zoo(FUN = as.yearqtr) %>%
  as.ts

tser
##            cpi   food non_food services
## 2002 Q1 100.00 100.00   100.00   100.00
## 2002 Q2 103.09 102.82   101.23   107.47
## 2002 Q3 104.29 103.71   102.01   110.41
## 2002 Q4 105.42 104.18   102.70   114.50

acf(tser)

Alternately if long is the data frame at the end of the question then this gives the same result.

library(tidyr)
library(zoo)

tser2 <- long %>%
  pivot_wider(values_from = index) %>%
  read.zoo(FUN = as.yearqtr) %>%
  as.ts

Note

df <- structure(list(name = c("cpi", "food", "non_food", "services"
), `2002Q1` = c(100L, 100L, 100L, 100L), `2002Q2` = c(103.09, 
102.82, 101.23, 107.47), `2002Q3` = c(104.29, 103.71, 102.01, 
110.41), `2002Q4` = c(105.42, 104.18, 102.7, 114.5)), class = "data.frame", row.names = c(NA, -4L))