From documentation of save.ffdf:
Using ‘save.ffdf’ automagically sets the ‘finalizer’s of the ‘ff’ vectors to ‘"close"’. This means that the data will be preserved on disk when the object is removed or the R sessions is closed. Data can be deleted either using ‘delete’ or by removing the directory where the object were saved (‘dir’).
I want to starting with a small ffdf data frame, add a bit new data at a time, and grow it on the disk. So I did a little experiment:
# in R
ffiris = as.ffdf(iris)
save.ffdf(ffiris, dir = "~/Desktop/iris")
# in bash
ls ~/Desktop/iris/
## ffiris$Petal.Length.ff ffiris$Petal.Width.ff ffiris$Sepal.Length.ff ffiris$Sepal.Width.ff ffiris$Species.ff
# in R
# add a new column
ffiris =transform(ffiris, new1 = rep(99, nrow(iris)))
rm(ffiris)
# in bash
ls ~/Desktop/iris/
## ffiris$Petal.Length.ff ffiris$Petal.Width.ff ffiris$Sepal.Length.ff ffiris$Sepal.Width.ff ffiris$Species.ff
It turns out it doesn't automatically update the ff data on disk when I remove ffiris. What about saving it manually?
# in R
# add a new column
ffiris =transform(ffiris, new1 = rep(99, nrow(iris)))
save.ffdf(ffiris, "~/Desktop/iris")
# in bash
ls ~/Desktop/iris/
## ffiris$Petal.Length.ff ffiris$Petal.Width.ff ffiris$Sepal.Length.ff ffiris$Sepal.Width.ff ffiris$Species.ff
Hmm, still no luck. Why?
What about removing the folder before saving?
# in R
ffiris = as.ffdf(iris)
unlink("~/Desktop/iris", recursive = TRUE, force = TRUE)
save.ffdf(ffiris, "~/Desktop/iris", overwrite = TRUE)
ffiris =transform(ffiris, new1 = rep(99, nrow(iris)))
unlink("~/Desktop/iris", recursive = TRUE, force = TRUE)
save.ffdf(ffiris, "~/Desktop/iris", overwrite = TRUE)
# in bash
ls ~/Desktop/iris/
# ls: /Users/ky/Desktop/iris/: No such file or directory
Even stranger. Even if this all works, it still would be terribly inefficient. I am looking for something like:
updateOnDisk(ffiris)
Could anyone help?
ff
and ffbase
offer out of memory R vectors, but introduce a reference semantics which can give problems with R idioms.
R is a functional programming language, meaning that functions do not change parameters and objects, but return modified copies. In ffbase
we implement functions in the R way, i.e. transform
returns a copy of the original ffdf data.frame
. This can be seen by looking at the filenames:
ffiris = as.ffdf(iris)
save.ffdf(ffiris, dir = "~/Desktop/iris")
filename(ffiris) # show contents of ~/Desktop/iris
ffiris =transform(ffiris, new1 = 99) # this create a copy of the whole data.frame!
filename(ffiris)
ffiris$new2 <- ff(rep(99, nrow(iris))) # this creates a new column, but not yet in the right directory
filename(ffiris)
save.ffdf(ffiris, dir="~/Desktop/iris", overwrite=TRUE) # this fixes that.
Transform is currently inefficient to add a new column, because it copies the whole data frame (that is R semantics). This is because transform might be a temparory result and you don't wont to change the original data.
In ffbase2 we are fixing this issue