rdplyrsampleff

How to do random sample from ff object


I want to extract the number of 1000 values from large size of ff object in R.

I have tried sample_frac from dplyr package, but this results in error as below;

Error: tbl must be a data frame, not a ffdf object

How can I solve this problem?


Solution

  • You can use ffbase2 package. It adds the dplyr interface to ff object

    install.packages("devtools")
    devtools::install_github("edwindj/ffbase2")
    

    And read ff object as tbl with tbl_ffdf function.

    iris_f <- tbl_ffdf(iris)
    
    species <- 
       iris_f %>%
       group_by(Species) %>%
       summarise(petal_width = sum(Petal.Width))