My data is large CPI dataset containing headline CPI as well as various subindexes (i.e. food, non-food, services). Initial data stored in Excel spreadsheet in wide format like this:
name | 2002Q1 | 2002Q2 | 2002Q3 | 2002Q4 |
---|---|---|---|---|
cpi | 100 | 103.09 | 104.29 | 105.42 |
food | 100 | 102.82 | 103.71 | 104.18 |
non_food | 100 | 101.23 | 102.01 | 102.70 |
services | 100 | 107.47 | 110.41 | 114.50 |
After importing data in dataframe df using read_excel, I pivot it in longer format:
df <- pivot_longer(df,
cols = !name,
names_to = "date",
values_to = "index"
)
So, my dataframe looks like this:
name date index
<chr> <chr> <dbl>
1 cpi 2002Q1 100
2 cpi 2002Q2 103.
3 cpi 2002Q3 104.
4 cpi 2002Q4 105.
5 food 2002Q1 100
6 food 2002Q2 102.
...
Now I'm trying to convert dataframe into ts object to analyze acf/pacf functions, but if I use df.ts <- ts(df, start = c(2002,1), frequency = 4)
then I found out that new mts object not only change names into digital codes, but also treat subindexes (i.e. food) as a continuation of headline index (cpi) in periods 2003Q1, 2003Q2 and so on, which is not correct (food is a different index in 2002Q1-2002Q4 time period).
In Stata panel data structure was pretty much the same and I was able to use expressions like acf ln_food if name==food
, but it looks like mts object in R works a little bit differently.
What is a better way of data layout in mts object to effectively analyse acf/pacf functions, take differences, fit arima models and calculate correlations? Should it be organized like this or something:
date cpi food non-food services
2002Q1 100 100 100 100
2002Q2 103. 102. 101. 107.
2002Q3 104. 103. 102. 110.
2002Q4 105. 104. 102. 114.
And also how can I retain name of the series or id of a series in mts objects?
The following converts df
(the data frame shown at the top of the question and shown in Note at end) to a "ts" object and then takes the acf
. The as.ts
line could be omitted in which case it would return a "zoo" time series object.
library(tibble)
library(zoo)
tser <- df %>%
column_to_rownames("name") %>%
t %>%
as.data.frame %>%
rownames_to_column("date") %>%
read.zoo(FUN = as.yearqtr) %>%
as.ts
tser
## cpi food non_food services
## 2002 Q1 100.00 100.00 100.00 100.00
## 2002 Q2 103.09 102.82 101.23 107.47
## 2002 Q3 104.29 103.71 102.01 110.41
## 2002 Q4 105.42 104.18 102.70 114.50
acf(tser)
Alternately if long
is the data frame at the end of the question then this gives the same result.
library(tidyr)
library(zoo)
tser2 <- long %>%
pivot_wider(values_from = index) %>%
read.zoo(FUN = as.yearqtr) %>%
as.ts
df <- structure(list(name = c("cpi", "food", "non_food", "services"
), `2002Q1` = c(100L, 100L, 100L, 100L), `2002Q2` = c(103.09,
102.82, 101.23, 107.47), `2002Q3` = c(104.29, 103.71, 102.01,
110.41), `2002Q4` = c(105.42, 104.18, 102.7, 114.5)), class = "data.frame", row.names = c(NA, -4L))