Is there a way to break this into two steps so that the ml_logistic_regression() can be applied separately to flights_pipeline?
Below is working code for the pipeline:
flights_pipeline <- ml_pipeline(sc) %>%
ft_dplyr_transformer(
tbl = df
) %>%
ft_binarizer(
input_col = "dep_delay",
output_col = "delayed",
threshold = 15
) %>%
ft_bucketizer(
input_col = "sched_dep_time",
output_col = "hours",
splits = c(400, 800, 1200, 1600, 2000, 2400)
) %>%
ft_r_formula(delayed ~ month + day + hours + distance) %>%
ml_logistic_regression()
This is my attempt, I'd like to break it into two steps - something like this:
flights_pipeline <- ml_pipeline(sc) %>%
ft_dplyr_transformer(
tbl = df
) %>%
ft_binarizer(
input_col = "dep_delay",
output_col = "delayed",
threshold = 15
) %>%
ft_bucketizer(
input_col = "sched_dep_time",
output_col = "hours",
splits = c(400, 800, 1200, 1600, 2000, 2400)
) %>%
ft_r_formula(delayed ~ month + day + hours + distance)
flights_pipeline_with_model <- flights_pipeline %>%
ml_logistic_regression()
Not clear based on the description and the OP's second code block. If the intention is to create an object within the pipe and continue with the pipe, perhaps pipeR
could help
library(pipeR)
ml_pipeline(sc) %>%
ft_dplyr_transformer(
tbl = df
) %>%
ft_binarizer(
input_col = "dep_delay",
output_col = "delayed",
threshold = 15
) %>%
ft_bucketizer(
input_col = "sched_dep_time",
output_col = "hours",
splits = c(400, 800, 1200, 1600, 2000, 2400)
) %>%
ft_r_formula(delayed ~ month + day + hours + distance) %>%
(~flights_pipeline) %>>%
ml_logistic_regression()