rprogress-bardplyr

How to add progress bar inside dplyr chain in R


I like dplyr's "progress_estimated" function but I can't figure out how to get a progress bar to work inside a dplyr chain. I've put a reproducible example with code at the bottom here.

I have a pretty big data.frame like this:

                cdatetime latitude longitude   
1 2013-01-11 06:40:00 CST 49.74697 -93.30951
2 2013-01-12 15:55:00 CST 49.74697 -93.30951 
3 2013-01-07 20:30:00 CST 49.74697 -93.30951 

and I'd like to calculate sunrise times for each date, using the libraries

library(dplyr)
library(StreamMetabolism)

I can get dplyr's progress_estimated bar to work within a loop, e.g.:

Ugly loop (works)

p <- progress_estimated(nrow(test))

for (i in 1:nrow(test)){
  p$tick()$print()
  datetime = as.POSIXct(substr(test$cdatetime[i], 1, 20), tz = "CST6CDT")
  test$sunrise[i] <- sunrise.set(test$latitude[i], test$longitude[i], datetime, "CST6CDT", num.days = 1)[1,1]
}

but how can I nest it in my function, so I can avoid using a loop?

Prefer to use:

SunriseSet <- function(dataframe, timezone){
  dataframe %>% 
    rowwise() %>% 
    mutate(# calculate the date-time using the correct timezone
      datetime = as.POSIXct(substr(cdatetime, 1, 20), tz = timezone),
      # Get the time of sunrise and sunset on this day, at the county midpoint
      sunrise = sunrise.set(latitude, longitude, datetime, timezone, num.days = 1)[1,1])
}

How to get a progress bar here?

test2 <- SunriseSet(test, "CST6CDT")

Here's some example data:

test <- data.frame(cdatetime = rep("2013-01-11 06:40:00", 300),
                   latitude = seq(49.74697, 50.04695, 0.001),
                   longitude = seq(-93.30951, -93.27960, 0.0001))

Solution

  • Rather than using rowwise(), perhaps try pairing the map* functions from purrr with progress_estimated(). This answer follows the approach from https://rud.is/b/2017/03/27/all-in-on-r%E2%81%B4-progress-bars-on-first-post/.

    First, wrap your function in another function that updates the progress bar:

    SunriseSet <- function(lat, long, date, timezone, num.days, .pb = NULL) {
      if (.pb$i < .pb$n) .pb$tick()$print()
      sunrise.set(lat, long, date, timezone, num.days)
    }
    

    Then, iterate through your inputs with pmap, or pmap_df (to bind the outputs into a dataframe):

    library(purrr)
    pb <- progress_estimated(nrow(test), 0)
    test2 <- test %>% 
      mutate(
        sunrise = pmap_df(
          list(
            lat = latitude, 
            long = longitude,
            date = as.character(cdatetime)
          ),
          SunriseSet,
          timezone = "CST6CDT", num.days = 1, .pb = pb
        )$sunrise
      )