I have a dataset which I want to process using tq_mutate and rollapply with different parameter values.
Currently I'm using a for loop to go over all the parameter values but I'm sure this is not the most efficient or fastest way to do this task (especially when I am going to be looking at large numbers of parameter values). How could the for loop be improved or removed? I suspect it means using purrr::map or some other means (multithreading/multicore etc) but I've not been able to find useful examples online.
Below is some sample code. Please ignore the simplicity of the dataset and outputs of the scale function, it is for illustrative purposes only. What I want to do is iterate over many different V0 values.
library(dplyr)
library(tidyverse)
library(broom)
library(tidyquant)
my_bogus_function <- function(df, V0=1925) {
# WILL HAVE SOMETHING MORE SOPHISTICATED IN HERE BUT KEEPING IT SIMPLE
# FOR THE PURPOSES OF THE QUESTION
c(V0, V0*2)
}
window_size <- 7 * 24
cnames = c("foo", "bar")
df <- c("FB") %>%
tq_get(get = "stock.prices", from = "2016-01-01", to = "2017-01-01") %>%
dplyr::select("date", "open")
# CAN THIS LOOP BE DONE IN A MORE EFFICIENT MANNER?
for (i in (1825:1830)){
df <- df %>%
tq_mutate(mutate_fun = rollapply,
width = window_size,
by.column = FALSE,
FUN = my_bogus_function,
col_rename = gsub("$", sprintf(".%d", i), cnames),
V0 = i
)
}
# END OF THE FOR LOOP I WANT FASTER
Given that R uses one core I have found improvement by using the packages parallel, doSNOW and foreach which allows multiple cores to be used (Note that I'm on a windows machine so some other packages are not available).
I'm sure there are other answers out there to multithread/parallelise/vectorise code.
Here is the code for anyone interested.
library(dplyr)
library(tidyverse)
library(tidyquant)
library(parallel)
library(doSNOW)
library(foreach)
window_size <- 7 * 24
cnames = c("foo", "bar")
df <- c("FB") %>%
tq_get(get = "stock.prices", from = "2016-01-01", to = "2017-01-01") %>%
dplyr::select("date", "open")
my_bogus_function <- function(df, V0=1925) {
# WILL HAVE SOMETHING MORE SOPHISTICATED IN HERE BUT KEEPING IT SIMPLE
# FOR THE PURPOSES OF THE QUESTION
c(V0, V0*2)
}
# CAN THIS LOOP BE DONE IN A MORE EFFICIENT/FASTER MANNER? YES
numCores <- detectCores() # get the number of cores available
cl <- makeCluster(numCores, type = "SOCK")
registerDoSNOW(cl)
# Function to combine the outputs
mycombinefunc <- function(a,b){merge(a, b, by = c("date","open"))}
# Run the loop over multiple cores
meh <- foreach(i = 1825:1830, .combine = "mycombinefunc") %dopar% {
message(i)
df %>%
# Adjust everything
tq_mutate(mutate_fun = rollapply,
width = window_size,
by.column = FALSE,
FUN = my_bogus_function,
col_rename = gsub("$", sprintf(".%d", i), cnames),
V0 = i
)
}
stopCluster(cl)
# END OF THE FOR LOOP I WANTED FASTER