rsparklyrmagrittr

How do I use the pipe operator or something related to break a pipeline into two steps?


Is there a way to break this into two steps so that the ml_logistic_regression() can be applied separately to flights_pipeline?

Below is working code for the pipeline:

flights_pipeline <- ml_pipeline(sc) %>%
  ft_dplyr_transformer(
    tbl = df
    ) %>%
  ft_binarizer(
    input_col = "dep_delay",
    output_col = "delayed",
    threshold = 15
  ) %>%
  ft_bucketizer(
    input_col = "sched_dep_time",
    output_col = "hours",
    splits = c(400, 800, 1200, 1600, 2000, 2400)
  )  %>%
  ft_r_formula(delayed ~ month + day + hours + distance) %>% 
  ml_logistic_regression()

This is my attempt, I'd like to break it into two steps - something like this:

flights_pipeline <- ml_pipeline(sc) %>%
  ft_dplyr_transformer(
    tbl = df
    ) %>%
  ft_binarizer(
    input_col = "dep_delay",
    output_col = "delayed",
    threshold = 15
  ) %>%
  ft_bucketizer(
    input_col = "sched_dep_time",
    output_col = "hours",
    splits = c(400, 800, 1200, 1600, 2000, 2400)
  )  %>%
  ft_r_formula(delayed ~ month + day + hours + distance)

flights_pipeline_with_model <- flights_pipeline %>% 
  ml_logistic_regression()


Solution

  • Not clear based on the description and the OP's second code block. If the intention is to create an object within the pipe and continue with the pipe, perhaps pipeR could help

    library(pipeR)
    ml_pipeline(sc) %>%
      ft_dplyr_transformer(
        tbl = df
        ) %>%
      ft_binarizer(
        input_col = "dep_delay",
        output_col = "delayed",
        threshold = 15
      ) %>%
      ft_bucketizer(
        input_col = "sched_dep_time",
        output_col = "hours",
        splits = c(400, 800, 1200, 1600, 2000, 2400)
      )  %>%
      ft_r_formula(delayed ~ month + day + hours + distance) %>% 
      (~flights_pipeline) %>>%  
      ml_logistic_regression()