tl;dr
How do I make "partition" from multiplyr split on multiple columns?
Motivation:
I was unhappy with using 1 of 32 cores for hard-working summarize, so I am trying to use multi-dplyer I am operating on multiple columns.
Example:
The vignette shows grouping by a single column, but when I do that, my other grouping column is not considered.
Code:
library(dplyr)
library(multidplyr)
library(nycflights13)
flights1 <- partition(flights, flight)
flights2 <- summarise(flights1, dep_delay = mean(dep_delay, na.rm = TRUE))
flights3 <- collect(flights2)
So how about splitting on year, month, and day?
This doesn't work for me:
flights1 <- partition(flights, list(year, month, day))
flights2 <- summarise(flights1, dep_delay = mean(dep_delay, na.rm = TRUE))
flights3 <- collect(flights2)
I can't seem to make this work. Can you point to a proper or at least effective way to do this?
According to ?partition
, the usage for partition
is
partition(.data, ..., cluster = get_default_cluster())
where ...
are variables to partition by. Instead of passing in a list of variables, pass in each variable separately, i.e.
partition(flights, year, month, day)