I'm trying to understand some behavior related to setting environmental variables within an R session.
Context: on computers with multiple cores, Intel MKL can induce data races during (sufficiently large) matrix multiplies. These data races occur depending on the threading model. In particular, on Ubuntu, if you do not set MKL_THREADING_LAYER = "GNU"
(not the default!) you might get data races.
can_induce_data_race <- function() {
X <- matrix(1:500 / 500, 50, 10)
Y <- matrix(1:1000 / 1000, 10, 100)
norm(X %*% Y)
}
Sys.getenv("MKL_THREADING_LAYER")
#> [1] ""
can_induce_data_race()
#> [1] 2997.423
can_induce_data_race()
#> [1] 2986.476
can_induce_data_race()
#> [1] 2757.553
Now, if I start a new R session using callr::r()
, I can both reproduce this issue, and also, by passing, MKL_THREADING_LAYER = "GNU"
, resolve it.
callr::r(can_induce_data_race)
#> [1] 2997.423
callr::r(can_induce_data_race, env = c(MKL_THREADING_LAYER = "GNU"))
#> [1] 249.7852
I was hoping that I could resolve the issue from within my R session, as follows, but it does not seem to work.
callr::r(can_induce_data_race)
#> [1] 2967.369
Sys.setenv(MKL_THREADING_LAYER = "GNU")
Sys.getenv("MKL_THREADING_LAYER")
#> [1] "GNU"
can_induce_data_race()
#> [1] 2997.423
However, using callr::r()
at this point, the data race is eliminated. Further, if I specify MKL_THREADING_LAYER = "GNU"
in my .Renviron
file, the data race is eliminated.
callr::r(can_induce_data_race)
#> [1] 249.7852
callr::r(can_induce_data_race, env = c(MKL_THREADING_LAYER = "GNU"))
#> [1] 249.7852
Why does MKL_THREADING_LAYER = "GNU"
get respected when I specify it in the env
argument to callr::r()
or via .Renviron
, but not when I explicitly set it via Sys.setenv()
?
For a number of reasons, it is common for programs and libraries to read environment variables only once, at startup. If you change the value after the library has already been loaded, it is too late: the environment variable setting has already been read and applied, and the variable will not be consulted further. You can only count on the new value being used by child processes you spawn.
None of this is specific to MKL or R; it is in fact very common practice in general. Looking up environment variables is relatively expensive, is not thread-safe, and it is often not even practical to change the setting influenced by the variable at runtime. The choice of a threading back-end surely sounds like such a case.
If you want the environment variable applied to every R session you start, define an exported environment variable MKL_THREADING_LAYER=GNU
somewhere in .bashrc
, .xsessionrc
, in a .desktop
file, or in some other equivalent location (depending on what kind of shell you use).