rrstudioprojectconfidentiality

Do RStudio projects store any temporary data?


I am using an RStudio project to work with confidential data (i.e. a project associated with a working directory, not-version-control). I want to share my script confidential_script.R and project confidential_project.Rproj with a collaborator without sharing any real data, including temporary files or metadata. I am making sure not to save or share any .RData files. However, RStudio in Windows automatically creates the .Rproj.user hidden folder with what appears to be project metadata.

Can I share the RStudio project file(s) without compromising any confidential information?


Solution

  • The best way to manage confidential dependencies is to declare them as R objects at the top of a script, and to eliminate the need to share metadata files such as an R project or RStudio project.

    Ideally one would create a test version of the confidential information that contains random / anonymized data, develop a few tests / reports for validation, and include these items with the R script so the other collaborators can ensure it works before using it with live data.

    The script, parameters, test data and test cases make the script completely reproducible.

    Example: download and combine Pokémon stats files

    The following example script downloads statistics for the first seven generations of Pokémon and combines them into a single data frame for subsequent analysis.

    # name of zip file assigned to theZipFile object
    theZipFile <- "https://raw.githubusercontent.com/lgreski/pokemonData/master/pokemonData.zip"
    
    download.file(theZipFile,
                  "pokemonData.zip",
                  method="curl",mode="wb")
    unzip("pokemonData.zip")
    
    thePokemonFiles <- list.files("./pokemonData",
                                  full.names=TRUE)
    thePokemonFiles 
    
    pokemonData <- lapply(thePokemonFiles,function(x) read.csv(x))
    
    # a list of 7 data frames
    summary(pokemonData)
    
    pokemonData <- do.call(rbind,pokemonData)
    
    summary(pokemonData)