rplotlarge-data-volumes

Plotting of very large data sets in R


How can I plot a very large data set in R?

I'd like to use a boxplot, or violin plot, or similar. All the data cannot be fit in memory. Can I incrementally read in and calculate the summaries needed to make these plots? If so how?


Solution

  • In supplement to my comment to Dmitri answer, a function to calculate quantiles using ff big-data handling package:

    ffquantile<-function(ffv,qs=c(0,0.25,0.5,0.75,1),...){
     stopifnot(all(qs<=1 & qs>=0))
     ffsort(ffv,...)->ffvs
     j<-(qs*(length(ffv)-1))+1
     jf<-floor(j);ceiling(j)->jc
     rowSums(matrix(ffvs[c(jf,jc)],length(qs),2))/2
    }
    

    This is an exact algorithm, so it uses sorting -- and thus may take a lot of time.