Here is my toy time series data:
library(tidyverse); library(tsibble); library(feasts)
df <- tibble::tribble(
~date, ~A, ~B, ~C,
"1/31/2010", NA, 0.017, NA,
"2/28/2010", NA, 0.027, NA,
"3/31/2010", NA, 0.003, 0.003,
"4/30/2010", -0.022, 0.018, 0.018,
"5/31/2010", -0.036, 0.02, 0.02,
"6/30/2010", -0.046, 0.023, 0.023,
"7/31/2010", NA, 0.027, 0.027,
"8/31/2010", -0.022, 0.008, 0.008,
"9/30/2010", 0.059, -0.003, -0.003,
"10/31/2010", 0.024, 0.058, 0.058,
"11/30/2010", NA, 0.023, NA,
"12/31/2010", NA, 0.014, NA
)
I want to calculate autocorrelation (acf) of multiple time series. Ignoring the imputation part, I need to:
I started here and got stuck:
df %>%
mutate(date = mdy(date)) %>%
pivot_longer(cols = -date) %>%
as_tsibble(key = name, index = date) %>%
ACF()
The expected output would have autocorrelations of every possible series by lag. Like B will have 10-11 values for 10 lags and same for series B
We can make use of rle
. Let's define a concise custom function has_middle_NA
has_middle_NA <- function(x) {
rl <- rle(is.na(x))$values
any(rl[-c(1, length(rl))])
}
Then
df %>%
group_by(date) %>%
select_if(~ !has_middle_NA(.x)) %>%
ungroup()
## A tibble: 12 x 3
# date B C
# <chr> <dbl> <dbl>
# 1 1/31/2010 0.017 NA
# 2 2/28/2010 0.027 NA
# 3 3/31/2010 0.003 0.003
# 4 4/30/2010 0.018 0.018
# 5 5/31/2010 0.02 0.02
# 6 6/30/2010 0.023 0.023
# 7 7/31/2010 0.027 0.027
# 8 8/31/2010 0.008 0.008
# 9 9/30/2010 -0.003 -0.003
#10 10/31/2010 0.058 0.058
#11 11/30/2010 0.023 NA
#12 12/31/2010 0.014 NA
This removes all columns with NA
s that are not leading or trailing.
It's still not really clear to me what you're trying to do with ACF based on the data you give; but perhaps this helps.
The key is to treat your data as monthly data, ignoring the day. We can then:
zoo::yearmon
,NA
s "in the middle",tsibble
from every column,feasts::ACF
to calculate the ACF for every column and store the result in a list
column of tsibble
slibrary(tsibble)
library(tidyverse)
library(feasts)
library(zoo)
df <- df %>%
mutate(date = as.yearmon(date, format = "%m/%d/%Y")) %>%
group_by(date) %>%
select_if(~ !has_middle_NA(.x)) %>%
ungroup() %>%
pivot_longer(-date) %>%
group_by(name) %>%
nest() %>%
mutate(
data = map(data, as_tsibble),
ACF = map(data, ACF))
## A tibble: 2 x 3
## Groups: name [2]
# name data ACF
# <chr> <list> <list>
#1 B <tsibble [12 × 2]> <tsibble [10 × 2]>
#2 C <tsibble [12 × 2]> <tsibble [7 × 2]>