rpackrat

How does the use.cache feature of Packrat work?


Packrat has a use.cache feature to reduce package installation time.

The documentation provides the following info:

use.cache: Install packages into a global cache, which is then shared across projects? The directory to use is read through Sys.getenv("R_PACKRAT_CACHE_DIR"). Not yet implemented for Windows. (logical; defaults to FALSE)

However, running install.package() doesn't grab readily installed packages from the users library.

How does use.cache work?


Solution

  • Installing with Global Cache Enabled

    Set up the cache with packrat using the following command:

    #Optional to set location of cache:
    #Sys.setenv(R_PACKRAT_CACHE_DIR = "/home/willbowditch/R/packratcache")
    
    packrat::set_opts(use.cache=TRUE)
    

    This is written to packrat.opts, which decides whether the cache is used when a project is opened in Rstudio.

    auto.snapshot: TRUE
    use.cache: TRUE
    print.banner.on.startup: auto
    vcs.ignore.lib: TRUE
    vcs.ignore.src: FALSE
    external.packages:
    local.repos:
    load.external.packages.on.startup: TRUE
    ignored.packages:
    quiet.package.installation: TRUE
    snapshot.recommended.packages: FALSE
    snapshot.fields:
        Imports
        Depends
        LinkingTo
    

    Both base libs and installed libs get stored in the cache and symlinked:

    ./packrat/lib/x86_64-pc-linux-gnu/3.4.0:
    total 2
    drwxr-xr-x 2 willbowditch staff  4 Jun 14 16:21 .
    drwxr-xr-x 3 willbowditch staff  3 Jun 14 16:20 ..
    lrwxrwxrwx 1 willbowditch staff 99 Jun 14 16:21 CheckDigit -> /home/willbowditch/R/packratcache/v2/library/CheckDigit/0ab3083cafb11382646fdda41ddb8b98/CheckDigit
    lrwxrwxrwx 1 willbowditch staff 93 Jun 14 16:21 packrat -> /home/willbowditch/R/packratcache/v2/library/packrat/6ad605ba7b4b476d84be6632393f5765/packrat
    
    ./packrat/lib-ext:
    total 9
    drwxr-xr-x 2 willbowditch staff 2 Jun 14 16:20 .
    drwxr-xr-x 6 willbowditch staff 9 Jun 14 16:20 ..
    
    ./packrat/lib-R:
    total 24
    drwxr-xr-x 2 willbowditch staff 16 Jun 14 16:20 .
    drwxr-xr-x 6 willbowditch staff  9 Jun 14 16:20 ..
    lrwxrwxrwx 1 willbowditch staff 29 Jun 14 16:20 base -> /usr/local/lib/R/library/base
    lrwxrwxrwx 1 willbowditch staff 33 Jun 14 16:20 compiler -> /usr/local/lib/R/library/compiler
    lrwxrwxrwx 1 willbowditch staff 33 Jun 14 16:20 datasets -> /usr/local/lib/R/library/datasets
    lrwxrwxrwx 1 willbowditch staff 33 Jun 14 16:20 graphics -> /usr/local/lib/R/library/graphics
    lrwxrwxrwx 1 willbowditch staff 34 Jun 14 16:20 grDevices -> /usr/local/lib/R/library/grDevices
    lrwxrwxrwx 1 willbowditch staff 29 Jun 14 16:20 grid -> /usr/local/lib/R/library/grid
    lrwxrwxrwx 1 willbowditch staff 32 Jun 14 16:20 methods -> /usr/local/lib/R/library/methods
    lrwxrwxrwx 1 willbowditch staff 33 Jun 14 16:20 parallel -> /usr/local/lib/R/library/parallel
    lrwxrwxrwx 1 willbowditch staff 32 Jun 14 16:20 splines -> /usr/local/lib/R/library/splines
    lrwxrwxrwx 1 willbowditch staff 30 Jun 14 16:20 stats -> /usr/local/lib/R/library/stats
    lrwxrwxrwx 1 willbowditch staff 31 Jun 14 16:20 stats4 -> /usr/local/lib/R/library/stats4
    lrwxrwxrwx 1 willbowditch staff 30 Jun 14 16:20 tcltk -> /usr/local/lib/R/library/tcltk
    lrwxrwxrwx 1 willbowditch staff 30 Jun 14 16:20 tools -> /usr/local/lib/R/library/tools
    lrwxrwxrwx 1 willbowditch staff 30 Jun 14 16:20 utils -> /usr/local/lib/R/library/utils
    

    If you try and install a package it overwrites the symlink, rather than fetching the package from the cache, so it cannot be used to speed up the install of packages.

    >install.packages('CheckDigit')
    Installing package into ‘/home/willbowditch/packrattest/packrat/lib/x86_64-pc-linux-gnu/3.4.0’
    (as ‘lib’ is unspecified)
    trying URL 'https://mran.microsoft.com/snapshot/2017-06-07/src/contrib/CheckDigit_0.1-1.tar.gz'
    Content type 'application/octet-stream' length 3777 bytes
    ==================================================
    downloaded 3777 bytes
    
    * installing *source* package ‘CheckDigit’ ...
    ** package ‘CheckDigit’ successfully unpacked and MD5 sums checked
    ** R
    ** inst
    ** preparing package for lazy loading
    ** help
    *** installing help indices
    ** building package indices
    ** testing if installed package can be loaded
    * DONE (CheckDigit)
    
    The downloaded source packages are in
        ‘/tmp/RtmpxAU8pv/downloaded_packages’
    

    But it does speed up the initiation of packrat projects that you are working on if the packages are require or library calls in the current directory. In this case packrat::init() or packrat::restore() restores the packages from the cache, but only if the packages have already been used in a cache enabled Packrat project before.

    > packrat::init()
    Initializing packrat project in directory:
    - "~/six"
    Fetching sources for BH (1.62.0-1) ... OK (CRAN current)
    Fetching sources for DBI (0.6-1) ... OK (CRAN current)
    Fetching sources for R6 (2.2.0) ... OK (CRAN current)
    Fetching sources for Rcpp (0.12.10) ... OK (CRAN current)
    Fetching sources for assertthat (0.2.0) ... OK (CRAN current)
    Fetching sources for dplyr (0.5.0) ... OK (CRAN current)
    Fetching sources for lazyeval (0.2.0) ... OK (CRAN current)
    Fetching sources for magrittr (1.5) ... OK (CRAN current)
    Fetching sources for packrat (0.4.8-1) ... OK (CRAN current)
    Fetching sources for stringi (1.1.5) ... OK (CRAN current)
    Fetching sources for tibble (1.3.0) ... OK (CRAN current)
    Fetching sources for tidyr (0.6.2) ... OK (CRAN current)
    Fetching sources for whisker (0.3-2) ... OK (CRAN current)
    Snapshot written to '/home/willbowditch/six/packrat/packrat.lock'
    Installing BH (1.62.0-1) ... 
        OK (symlinked cache)
    Installing DBI (0.6-1) ... 
        OK (symlinked cache)
    Installing R6 (2.2.0) ... 
        OK (symlinked cache)
    Installing Rcpp (0.12.10) ... 
        OK (symlinked cache)
    Installing assertthat (0.2.0) ... 
        OK (symlinked cache)
    Installing lazyeval (0.2.0) ... 
        OK (symlinked cache)
    Installing magrittr (1.5) ... 
        OK (symlinked cache)
    Installing packrat (0.4.8-1) ... 
        OK (symlinked cache)
    Installing stringi (1.1.5) ... 
        OK (symlinked cache)
    Installing whisker (0.3-2) ... 
        OK (symlinked cache)
    Installing tibble (1.3.0) ... 
        OK (symlinked cache)
    Installing dplyr (0.5.0) ... 
        OK (symlinked cache)
    Installing tidyr (0.6.2) ... 
        OK (symlinked cache)
    Initialization complete!
    

    In other words packages dont seem to go from global library to cache, but they can go from other packrat libraries to the cache.

    Installing packages to a Packrat project from the users home (~) library quickly

    As far as I can tell you can't use packages that haven't already been installed in packrat to shorten loading times with the cache option. This can be a problem when installing large packages, such as the tidyverse, from source (as you have to on Linux systems).

    There are a couple of workarounds:

    Workaround 1: Symlink your library

    A straightforward workaround is to symlink the users package library to an empty packrat directory. Install time via this method is a few seconds and it doesn't seem to interfere with the process of creating a snapshot as long as packrat::clean() is run at the end of development.

    Steps

    New Project > using packrat

    source('https://raw.githubusercontent.com/willbowditch/ratpack/master/R/ratpack.R')
    symlink_packages()
    #Develop as normal then run 
    packrat::clean()
    packrat::snapshot(ignore.stale=TRUE) 
    

    Workaround 2: external.packages

    Packrat does provide a workaround for large packages with the packrat::set_opts(external.packages=c('pkgname')) command, but packages installed in this way aren't included in the packrat/src folder.

    In effect, the option symlinks the package directories to the packrat/lib-ext diretory.

    I had a go at automating this, in the same way as the symlinking option - to grab all the users packages in their home directory and add them to the external.packages option.

    Steps

    New Project > using packrat

    source('https://raw.githubusercontent.com/willbowditch/ratpack/master/R/ratpack.R')
       import_user_packages()
       #All installed packages will now be accessable within the packrat session
    

    To reset at the end of development

       packrat::set_opts(external.packages=NULL)
       packrat::snapshot()
       packrat::restore() #This step will install the packages if they're not in the cache
    

    The simplest option

    Somewhere in between these options might make the most sense - users currate their list of large but commonly used packages to be symlinked (i.e.packrat::set_opts(external.packages=c('tidyverse', 'data.table')) ) and then put up with installing smaller packages on a project by project basis.