I am using an RStudio project to work with confidential data (i.e. a project associated with a working directory, not-version-control). I want to share my script confidential_script.R
and project confidential_project.Rproj
with a collaborator without sharing any real data, including temporary files or metadata. I am making sure not to save or share any .RData
files. However, RStudio in Windows automatically creates the .Rproj.user
hidden folder with what appears to be project metadata.
Can I share the RStudio project file(s) without compromising any confidential information?
The best way to manage confidential dependencies is to declare them as R objects at the top of a script, and to eliminate the need to share metadata files such as an R project or RStudio project.
Ideally one would create a test version of the confidential information that contains random / anonymized data, develop a few tests / reports for validation, and include these items with the R script so the other collaborators can ensure it works before using it with live data.
The script, parameters, test data and test cases make the script completely reproducible.
Example: download and combine Pokémon stats files
The following example script downloads statistics for the first seven generations of Pokémon and combines them into a single data frame for subsequent analysis.
# name of zip file assigned to theZipFile object
theZipFile <- "https://raw.githubusercontent.com/lgreski/pokemonData/master/pokemonData.zip"
download.file(theZipFile,
"pokemonData.zip",
method="curl",mode="wb")
unzip("pokemonData.zip")
thePokemonFiles <- list.files("./pokemonData",
full.names=TRUE)
thePokemonFiles
pokemonData <- lapply(thePokemonFiles,function(x) read.csv(x))
# a list of 7 data frames
summary(pokemonData)
pokemonData <- do.call(rbind,pokemonData)
summary(pokemonData)