I am working with Rmarkdown
parent and child files, and using a YAML header to make it so that when I click "Knit" in RStudio on a child document, it compiles the parent (as seen here: https://stackoverflow.com/a/79655552/1129889).
Consider the following two files, in the same folder:
parent.Rmd
---
title: "Test"
output: pdf_document
---
```{r include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(ggplot2)
```
```{r child = "child.Rmd"}
```
child.Rmd
---
knit: (\(input,...) rmarkdown::render('parent.Rmd',...))
editor_options:
chunk_output_type: console
---
```{r}
# Create simple plot
data.frame(
x = seq(.01,.99,length.out=100),
y = seq(.01,.99,length.out=100)
) |>
ggplot(aes(x=x,y=y)) +
geom_point() -> gg1
```
```{r eval=FALSE}
source(textConnection(
r"[
neglog_trans <- function(base=10){
trans <- \(x) -logb(x,base=base)
inv <- \(x) base^(-x)
scales::trans_new("neglog", transform = trans, inverse = inv)
}
]"
))
```
```{r eval=TRUE}
neglog_trans <- function(base=10){
trans <- \(x) -logb(x,base=base)
inv <- \(x) base^(-x)
scales::trans_new("neglog", transform = trans, inverse = inv)
}
```
```{r}
# Create plot adding custom transformation in scales
gg1 +
scale_x_continuous(
transform = "neglog"
) +
scale_y_continuous(
transform = "neglog"
)
```
I would expect that clicking the "Knit" button in RStudio would render the parent document. Up to now, this has worked fine.
However, I've run into a strange issue with custom transformations in ggplot2
. When I define a custom transformation in the child and click the "Knit" button, I get an error indicating that R can't find the transformation function despite it being defined right before.
The error is:
Error in `as.transform()`:
! Could not find any function named `transform_neglog()` or
`neglog_trans()`
The following actions seem to work as intended, compiling the file.
Not using the custom transform.
Clicking "Knit" for the parent.Rmd
Reading the file in via source()
just before the ggplot
chunk (i.e., change eval
to TRUE
in the source()
chunk in the child.Rmd
, and setting the function declaration chunk to eval=FALSE
) and then clicking "Knit" for child.Rmd
.
I expect this also to work, but it does not:
Clicking "Knit" for child.Rmd
I get the same error when I run the following code in the R console to run rmarkdown::render()
on the parent. I don't understand why this doesn't yield the same results as "Knit" button on the parent.
(\(input,...) rmarkdown::render('parent.Rmd',...))()
callr::r((\(input,...) rmarkdown::render('parent.Rmd',...)))
It is especially a mystery to me why reading in the function via source()
works, but simply giving the function in a chunk does not!
I can also confirm that I tried this with the development version of knitr
with the same result.
> xfun::session_info()
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.5
Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8
Package version:
base64enc_0.1.3 bslib_0.9.0
cachem_1.1.0 callr_3.7.6
cli_3.6.5 compiler_4.4.2
digest_0.6.37 dplyr_1.1.4
evaluate_1.0.3 farver_2.1.2
fastmap_1.2.0 fontawesome_0.5.3
fs_1.6.6 generics_0.1.4
ggplot2_3.5.2 glue_1.8.0
graphics_4.4.2 grDevices_4.4.2
grid_4.4.2 gtable_0.3.6
here_1.0.1 highr_0.11
htmltools_0.5.8.1 isoband_0.2.7
jquerylib_0.1.4 jsonlite_2.0.0
knitr_1.50 labeling_0.4.3
lattice_0.22.7 lifecycle_1.0.4
magrittr_2.0.3 MASS_7.3.65
Matrix_1.7.3 memoise_2.0.1
methods_4.4.2 mgcv_1.9.3
mime_0.13 nlme_3.1.168
pillar_1.10.2 pkgconfig_2.0.3
processx_3.8.6 ps_1.9.1
R6_2.6.1 rappdirs_0.3.3
RColorBrewer_1.1-3 renv_1.1.4
rlang_1.1.6 rmarkdown_2.29
rprojroot_2.0.4 sass_0.4.10
scales_1.4.0 splines_4.4.2
stats_4.4.2 tibble_3.3.0
tidyselect_1.2.1 tinytex_0.57
tools_4.4.2 utf8_1.2.6
utils_4.4.2 vctrs_0.6.5
viridisLite_0.4.2 withr_3.0.2
xfun_0.52 yaml_2.3.10
> xfun::session_info('knitr')
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.5
Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8
Package version:
evaluate_1.0.3 graphics_4.4.2 grDevices_4.4.2
highr_0.11 knitr_1.50 methods_4.4.2
stats_4.4.2 tools_4.4.2 utils_4.4.2
xfun_0.52 yaml_2.3.10
> xfun::session_info('rmarkdown')
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.5
Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8
Package version:
base64enc_0.1.3 bslib_0.9.0 cachem_1.1.0
cli_3.6.5 digest_0.6.37 evaluate_1.0.3
fastmap_1.2.0 fontawesome_0.5.3 fs_1.6.6
glue_1.8.0 graphics_4.4.2 grDevices_4.4.2
highr_0.11 htmltools_0.5.8.1 jquerylib_0.1.4
jsonlite_2.0.0 knitr_1.50 lifecycle_1.0.4
memoise_2.0.1 methods_4.4.2 mime_0.13
R6_2.6.1 rappdirs_0.3.3 rlang_1.1.6
rmarkdown_2.29 sass_0.4.10 stats_4.4.2
tinytex_0.57 tools_4.4.2 utils_4.4.2
xfun_0.52 yaml_2.3.10
Pandoc version: 3.4
Why is knitr
acting this way, and how can I fix it (if possible) so that I get the same behaviour whether I click "knit" on the child document of parent document?
Edited: after much thinking, the behavior makes sense and is mostly unavoidable. The use of transform=neglog_trans()
is not only canonical, it is unambiguous with no loss of generality or functionality. In contrast, the use of strings ala transform="neglog"
is a convenience only, and it includes added cost (to find the real function) and a loss of functionality.
(This discussion is well-informed by https://adv-r.hadley.nz/environments.html.)
The issue is with how environments are hierarchically searched and how R will search through them to find an object. Rendering/knitting a document uses a temporary environment which is a child and not a parent of the working (typically global) env. When scales::as.transform
is called (by ggplot2
-code here), its apparent search path is unambiguous. Starting from a temp environment (how knitr
works), I defined your gg1
and your function, and then
neglog_trans
# function(base=10){
# trans <- \(x) -logb(x,base=base)
# inv <- \(x) base^(-x)
# scales::trans_new("neglog", transform = trans, inverse = inv)
# }
# <environment: 0x12616e190> # <--- our temporary knitting environment
debugonce(scales::as.transform)
gg1 + scale_x_continuous(transform = "neglog")
Browse[2]> # in the R debugger
environment()
# <environment: 0x1167598e0> # <--- not in our temporary knitting environment
match.call()
# as.transform(x = transform)
rlang::search_envs()
# [[1]] $ <env: global>
# [[2]] $ <env: package:ggplot2>
# [[3]] $ <env: ESSR>
# [[4]] $ <env: package:stats>
# [[5]] $ <env: package:graphics>
# [[6]] $ <env: package:grDevices>
# [[7]] $ <env: package:utils>
# [[8]] $ <env: package:datasets>
# [[9]] $ <env: package:r2>
# [[10]] $ <env: package:methods>
# [[11]] $ <env: Autoloads>
# [[12]] $ <env: package:base>
(r2
is my personal package of utilities. ESSR
is because I use emacs/ESS instead of RStudio.) Notice that the (temporary) working environment 0x12616e190
is not in its search environment.
This is not under the control of scales
or knitr
, they do not control the search path nor how objects/functions are "found". There are two ways that as.transform
would be able to find your neglog_trans
:
If scales
knows the temp-environment. For this to happen, we'd need to pass that argument up the chain, meaning pass it to scales_x_continuous(transform="neglog", working_env=...)
(clearly this does not work) so that it can pass it to as.transform()
so that its get0(f2, mode="function")
could find neglog_trans
. Asking other packages' functions to pass around arbitrary search paths does not seem practical.
If neglog_trans
is defined in one of the environments/namespaces that scales
knows. For example, if neglog_trans
is defined in my globalenv
, then the temporary-knitting environment sees it and so does scales
. In this case, you should be able to use transform="neglog"
without a problem.
This works, but in general it is much preferred for an rmarkdown document to access objects that are (1) created within the document, or (2) explicitly passed via its params:
yaml portion (see https://bookdown.org/yihui/rmarkdown-cookbook/parameterized-reports.html).
If the object passed to transform=
were the actual function, therefore "no searching required".
The use of a string is a convenience: it provides no added functionality, it has no added generality. The only thing using strings gives you is fewer keystrokes (and perhaps aesthetically if you prefer strings as arguments). It could be analogous to passing a string "y"
to denote the logical TRUE
as a convenience object: it is fewer characters, but passing the object itself is more direct.
Many people may intentionally or unintentionally take the path of option #2, though relying on the global environment for objects when knitting can be risky and result in non-reproducible reports.
Option #3 is the canonical approach. To demonstrate why this is canonical and in fact more powerful than using "neglog"
, what happens if you want to use your neglog function with base=2
?
If you want to stay with strings using transform="neglog"
(and use one of the above ways for as.transform
to find it), you would then need to define a different function (or overwrite the first), perhaps neglog2_trans <- function() ...
and call transform="neglog2"
.
Canonical use:
gg1 +
scale_x_continuous(
transform = neglog_trans(base = 2)
) +
scale_y_continuous(
transform = neglog_trans() # default base=10
)
So the "you" that first defined neglog_trans()
actually counted on this ability to change the base=
at render-time by making it a parameterized function. I think you owe past-you some gratitude, you did it well.
In the end, my recommendation is to change from the "convenient" use of strings to the unambiguous function. It works, it will always work, and it enables an argument that you yourself programmed into the function itself (which strings would not allow).
gg1 +
scale_x_continuous(
transform = neglog_trans()
) +
scale_y_continuous(
transform = neglog_trans()
)
# plot rendered, no error