In the post below,
aggregation using ffdfdply function in R
There is a line like this.
splitby <- as.character(data$Date, by = 250000)
Just out of curiosity, I wonder what by
argument means. It seems to be related to ff
dataframe but I'm not sure. Google search and R documentation of as.character
and as.vector
provided no useful information.
I tried some examples but the codes below give the same results.
d <- seq.Date(Sys.Date(), Sys.Date()+10000, by = "day")
as.character(d, by=1)
as.character(d, by=10)
as.character(d, by=100)
If anybody could tell me what it is, I'd appreciate it. Thank you in advance.
Since as.character.ff
works using the default as.character
internally, and in view of the fact that df vectors can be larger than RAM, the data needs to be processed in chunks. The partition into chunks is facilitated by the chunk
function. In this case, the relevant method is chunk.ff_vector
. By default, this will calculate the chunk size by dividing getOption("ffbatchbytes")
by the record size. However, this behaviour can be overridden by supplying the chunk size using by
.
In the example you give, the ff vector will be converted to character
250000 members at a time.
The end result will be the same for any by
or without by
at all. Larger values will lead to greater temporary use of RAM but potentially quicker operation.