I am trying to establish a setup where I can create a persistent build for a docker image, which will in turn power the core of an R Statistics process. At this point I've figured out how to install exactly the R Packages that I am requesting, however, I do wonder how relevant the software supplied by the underlying system (in my case Ubuntu 20.04) is in regard to reproducibility in R. I am installing via apt-get install
but without version specification there.
Any guidance is appreciated.
This question is quite broad/vague, but you should probably worry first about
Beyond that, it will depend on whether the packages you are loading use additional system libraries (see the SystemRequirements:
field in the DESCRIPTION
file of the package, or on the CRAN web page). For example, sf
(a package for spatial data processing) lists
C++11, GDAL (>= 2.0.1), GEOS (>= 3.4.0), PROJ (>= 4.8.0), sqlite3
Speaking only for the first two (compiler/lin alg), the differences will be at the floating-point precision level. To the extent that the numerical methods used are robust/statistical problems you're working with are stable and well-posed, the differences will only be at the level that you can mitigate by using standard best practices for floating-point comparison (e.g., using all.equal()
rather than ==
or identical()
).