I have some data which looks like:
head:
dfID date group groupValues
<chr> <date> <chr> <dbl>
1 df1 2020-03-01 grp1 0.175
2 df1 2020-03-01 grp2 0.150
3 df1 2020-03-01 grp3 0.0509
tail:
dfID date group groupValues
<chr> <date> <chr> <dbl>
1 df3 2020-06-29 grp7 0.705
2 df3 2020-06-29 grp8 0.473
3 df3 2020-06-29 grp9 0.900
Which is a time series data with 3 unique dfID
's and 9 unique group
's. Filtering the date column to a single day I have (3 df's and 9 groups):
df %>%
filter(date == "2020-03-01")
dfID date group groupValues
<chr> <date> <chr> <dbl>
1 df1 2020-03-01 grp1 0.175
2 df1 2020-03-01 grp2 0.150
3 df1 2020-03-01 grp3 0.0509
4 df2 2020-03-01 grp4 0.133
5 df2 2020-03-01 grp5 0.779
6 df2 2020-03-01 grp6 0.506
7 df3 2020-03-01 grp7 0.868
8 df3 2020-03-01 grp8 0.552
9 df3 2020-03-01 grp9 0.274
I next want to split the data up into a pairwise combination of the dfID
's.
combinedSplit <- combn(levels(as.factor(df$dfID)), m = 2, FUN = function(x)
df %>%
filter(dfID %in% x), simplify = FALSE)
names(combinedSplit) <- combn(levels(as.factor(df$dfID)), m = 2, str_c, collapse="_")
So now I have a list of 3 dfs - one for each pairwise combination of the dfID
's:
df1_df2
df1_df3
df2_df3
Now, I want to apply the rolling_origin
function from the rsample
package to split the time series into training and testing data. I can naively apply the function by mapping over the lists.
rolledData <- combinedSplit %>%
map(., ~ rolling_origin(
data = .,
initial = 60,
assess = 1,
cumulative = FALSE,
skip = 0
)
)
I can access the tail of the first split for the combination df1_df2
.
map(rolledData$df1_df2$splits, ~analysis(.x))[[1]] %>% tail()
dfID date group groupValues
<chr> <date> <chr> <dbl>
1 df1 2020-03-19 grp1 0.528
2 df1 2020-03-19 grp2 0.394
3 df1 2020-03-19 grp3 0.532
4 df1 2020-03-20 grp1 0.586
5 df1 2020-03-20 grp2 0.369
6 df1 2020-03-20 grp3 0.153
Which is incorrect. In the rolling_origin
function I stated the training/assessment period to be 60 perids (days) but this data ends on the 20th March. This is because it is taking the first 60 observations of the data not of the time series (3 group
's * 20 days).
So I want to apply the rolling_origin
function to each of the grp
's - with each grp
having 60 day rolling windows.
Here, I thought it would be best to first nest()
the group
's and then apply the rolling_origin
function, since then each of the group
's would be separate and 60 days will correspond to grp1
, grp2
and grp3
. Then, unnest()
the group
's to put grp1
, grp2
and grp3
back into the same data frame.
nestedRolledData <- map(combinedSplit, ~group_by(.x, group) %>%
nest() %>%
mutate(
rolledData = map(data, ~.x %>%
rolling_origin(
data = .,
initial = 60,
assess = 0,
cumulative = FALSE,
skip = 0
)
)
)
)
I am having difficulty with this unnesting.
Inspecting the structure of the lists:
df1_df2
, df1_df3
and df2_df3
combinations.rolledData
contains 6 lists) contains a list for each of the grp
's (each df
das 3 grp
's).grp
's) contains a splits
list (generated from the rolling_origin
function).splits
list) contains [[1]]
... [[63]]
. The lists here correspond to each of the rolling_origin
function training splits. In the rolling_origin
function I set initial = 60
and the model creates 63 splits of the data between 2020-03-01 and 2020-06-30. (If I changed initial = 90
I get 33 splits. So the number of lists here is dependent on the time duration in the data and the initial
part in the rolling_origin
function).I can access the first splits
using map(nestedRolledData$df1_df2$rolledData[[1]]$splits, ~analysis(.x))[[1]]
which gives me a data frame of 60 observations starting on 2020-03-01
and ending 2020-04-29
. For the second split in this list map(nestedRolledData$df1_df2$rolledData[[1]]$splits, ~analysis(.x))[[2]]
I have another data frame containing 60 observations starting this time on the 2020-03-02
and ending on the 2020-04-30
(so this data has been shifted by a single day). I can do this up until [[63]]
where it starts on 2020-05-02
and ends on 2020-06-30
(which is the last day in my data).
This is what I want - i.e. the data made the correct time series splits for each grp
. Now I want to unnest these and put them back into the correct data frames. Going back to layer 3 of the list which contains [[1]], ... , [[6]]
6 lists. These correspond to each of the 3 groups for the 2 dfIDs
. So I essentially want to merge these lists together.
If I were to contruct these manually it might look like:
# grp 1:6 for rolling_origin split 1
map(nestedRolledData$df1_df2$rolledData[[1]]$splits, ~analysis(.x))[[1]]
map(nestedRolledData$df1_df2$rolledData[[2]]$splits, ~analysis(.x))[[1]]
map(nestedRolledData$df1_df2$rolledData[[3]]$splits, ~analysis(.x))[[1]]
map(nestedRolledData$df1_df2$rolledData[[4]]$splits, ~analysis(.x))[[1]]
map(nestedRolledData$df1_df2$rolledData[[5]]$splits, ~analysis(.x))[[1]]
map(nestedRolledData$df1_df2$rolledData[[6]]$splits, ~analysis(.x))[[1]]
# grp 1:6 for rolling_origin split 2
map(nestedRolledData$df1_df2$rolledData[[1]]$splits, ~analysis(.x))[[2]]
map(nestedRolledData$df1_df2$rolledData[[2]]$splits, ~analysis(.x))[[2]]
map(nestedRolledData$df1_df2$rolledData[[3]]$splits, ~analysis(.x))[[2]]
map(nestedRolledData$df1_df2$rolledData[[4]]$splits, ~analysis(.x))[[2]]
map(nestedRolledData$df1_df2$rolledData[[5]]$splits, ~analysis(.x))[[2]]
map(nestedRolledData$df1_df2$rolledData[[6]]$splits, ~analysis(.x))[[2]]
# ...
# grp 1:6 for rolling_origin split 63
map(nestedRolledData$df1_df2$rolledData[[1]]$splits, ~analysis(.x))[[63]]
map(nestedRolledData$df1_df2$rolledData[[2]]$splits, ~analysis(.x))[[63]]
map(nestedRolledData$df1_df2$rolledData[[3]]$splits, ~analysis(.x))[[63]]
map(nestedRolledData$df1_df2$rolledData[[4]]$splits, ~analysis(.x))[[63]]
map(nestedRolledData$df1_df2$rolledData[[5]]$splits, ~analysis(.x))[[63]]
map(nestedRolledData$df1_df2$rolledData[[6]]$splits, ~analysis(.x))[[63]]
Data:
library(rsample)
df1 = data.frame(
date = seq.Date(from = as.Date("2020-03-01"), to = as.Date("2020-06-30"), by = "days"),
grp1 = runif(122),
grp2 = runif(122),
grp3 = runif(122)
) %>%
pivot_longer(cols = where(is.numeric), names_to = "group", values_to = "groupValues") %>%
add_column(dfID = "df1")
df2 = data.frame(
date = seq.Date(from = as.Date("2020-03-01"), to = as.Date("2020-06-30"), by = "days"),
grp4 = runif(122),
grp5 = runif(122),
grp6 = runif(122)
) %>%
pivot_longer(cols = where(is.numeric), names_to = "group", values_to = "groupValues") %>%
add_column(dfID = "df2")
df3 = data.frame(
date = seq.Date(from = as.Date("2020-03-01"), to = as.Date("2020-06-30"), by = "days"),
grp7 = runif(122),
grp8 = runif(122),
grp9 = runif(122)
) %>%
pivot_longer(cols = where(is.numeric), names_to = "group", values_to = "groupValues") %>%
add_column(dfID = "df3")
df <- bind_rows(df1, df2, df3) %>%
relocate(dfID, .before = date)
map(rolledData$splits, ~analysis(.x))[[1]] %>% tail()
combinedSplit <- combn(levels(as.factor(df$dfID)), m = 2, FUN = function(x)
df %>%
filter(dfID %in% x), simplify = FALSE)
names(combinedSplit) <- combn(levels(as.factor(df$dfID)), m = 2, str_c, collapse="_")
rolledData <- combinedSplit %>%
map(., ~ rolling_origin(
data = .,
initial = 60,
assess = 1,
cumulative = FALSE,
skip = 0
)
)
map(rolledData$df1_df2$splits, ~analysis(.x))[[1]] %>% tail()
nestedRolledData <- map(combinedSplit, ~group_by(.x, group) %>%
nest() %>%
mutate(
rolledData = map(data, ~.x %>%
rolling_origin(
data = .,
initial = 60,
assess = 0,
cumulative = FALSE,
skip = 0
)
)
)
)
May be we need a 3 nested map
library(purrr)
library(rsample)
out <- map(nestedRolledData, ~ map(.x$rolledData, ~ map(.x$splits, analysis)))
str(out, max.level = 3)
#List of 3
# $ df1_df2:List of 6
# ..$ :List of 63
# .. ..$ : tibble [60 × 3] (S3: tbl_df/tbl/data.frame)
# .. ..$ : tibble [60 × 3] (S3: tbl_df/tbl/data.frame)
# .. ..$ : tibble [60 × 3] (S3: tbl_df/tbl/data.frame)
# .. ..$ : tibble [60 × 3] (S3: tbl_df/tbl/data.frame)
# .. ..$ : tibble [60 × 3] (S3: tbl_df/tbl/data.frame)
# ..