rggplot2r-markdownknitr

Why is knitr failing to see a function defined in a child document?


I am working with Rmarkdown parent and child files, and using a YAML header to make it so that when I click "Knit" in RStudio on a child document, it compiles the parent (as seen here: https://stackoverflow.com/a/79655552/1129889).

Consider the following two files, in the same folder:

---
title: "Test"
output: pdf_document
---

```{r include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(ggplot2)

```

```{r child = "child.Rmd"}
```

---
knit: (\(input,...) rmarkdown::render('parent.Rmd',...))
editor_options: 
  chunk_output_type: console
---

```{r}
# Create simple plot
data.frame(
  x = seq(.01,.99,length.out=100),
  y = seq(.01,.99,length.out=100)
) |>
  ggplot(aes(x=x,y=y)) +
  geom_point() -> gg1

```

```{r eval=FALSE}
source(textConnection(
r"[
neglog_trans <- function(base=10){
  trans <- \(x) -logb(x,base=base)
  inv <- \(x) base^(-x)
  scales::trans_new("neglog", transform = trans, inverse = inv)
}
]"
))
```

```{r eval=TRUE}
neglog_trans <- function(base=10){
  trans <- \(x) -logb(x,base=base)
  inv <- \(x) base^(-x)
  scales::trans_new("neglog", transform = trans, inverse = inv)
}
```

```{r}
# Create plot adding custom transformation in scales
gg1 +
  scale_x_continuous(
    transform = "neglog"
  ) +
  scale_y_continuous(
    transform = "neglog"
  )
```

I would expect that clicking the "Knit" button in RStudio would render the parent document. Up to now, this has worked fine.

However, I've run into a strange issue with custom transformations in ggplot2. When I define a custom transformation in the child and click the "Knit" button, I get an error indicating that R can't find the transformation function despite it being defined right before.

The error is:

Error in `as.transform()`:
! Could not find any function named `transform_neglog()` or
  `neglog_trans()`

What works

The following actions seem to work as intended, compiling the file.

What doesn't work

I expect this also to work, but it does not:

(\(input,...) rmarkdown::render('parent.Rmd',...))()
callr::r((\(input,...) rmarkdown::render('parent.Rmd',...)))

It is especially a mystery to me why reading in the function via source() works, but simply giving the function in a chunk does not!

I can also confirm that I tried this with the development version of knitr with the same result.


> xfun::session_info()
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.5

Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8

Package version:
  base64enc_0.1.3    bslib_0.9.0       
  cachem_1.1.0       callr_3.7.6       
  cli_3.6.5          compiler_4.4.2    
  digest_0.6.37      dplyr_1.1.4       
  evaluate_1.0.3     farver_2.1.2      
  fastmap_1.2.0      fontawesome_0.5.3 
  fs_1.6.6           generics_0.1.4    
  ggplot2_3.5.2      glue_1.8.0        
  graphics_4.4.2     grDevices_4.4.2   
  grid_4.4.2         gtable_0.3.6      
  here_1.0.1         highr_0.11        
  htmltools_0.5.8.1  isoband_0.2.7     
  jquerylib_0.1.4    jsonlite_2.0.0    
  knitr_1.50         labeling_0.4.3    
  lattice_0.22.7     lifecycle_1.0.4   
  magrittr_2.0.3     MASS_7.3.65       
  Matrix_1.7.3       memoise_2.0.1     
  methods_4.4.2      mgcv_1.9.3        
  mime_0.13          nlme_3.1.168      
  pillar_1.10.2      pkgconfig_2.0.3   
  processx_3.8.6     ps_1.9.1          
  R6_2.6.1           rappdirs_0.3.3    
  RColorBrewer_1.1-3 renv_1.1.4        
  rlang_1.1.6        rmarkdown_2.29    
  rprojroot_2.0.4    sass_0.4.10       
  scales_1.4.0       splines_4.4.2     
  stats_4.4.2        tibble_3.3.0      
  tidyselect_1.2.1   tinytex_0.57      
  tools_4.4.2        utf8_1.2.6        
  utils_4.4.2        vctrs_0.6.5       
  viridisLite_0.4.2  withr_3.0.2       
  xfun_0.52          yaml_2.3.10    
> xfun::session_info('knitr')
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.5

Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8

Package version:
  evaluate_1.0.3  graphics_4.4.2  grDevices_4.4.2
  highr_0.11      knitr_1.50      methods_4.4.2  
  stats_4.4.2     tools_4.4.2     utils_4.4.2    
  xfun_0.52       yaml_2.3.10  
> xfun::session_info('rmarkdown')
R version 4.4.2 (2024-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.5

Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8

Package version:
  base64enc_0.1.3   bslib_0.9.0       cachem_1.1.0     
  cli_3.6.5         digest_0.6.37     evaluate_1.0.3   
  fastmap_1.2.0     fontawesome_0.5.3 fs_1.6.6         
  glue_1.8.0        graphics_4.4.2    grDevices_4.4.2  
  highr_0.11        htmltools_0.5.8.1 jquerylib_0.1.4  
  jsonlite_2.0.0    knitr_1.50        lifecycle_1.0.4  
  memoise_2.0.1     methods_4.4.2     mime_0.13        
  R6_2.6.1          rappdirs_0.3.3    rlang_1.1.6      
  rmarkdown_2.29    sass_0.4.10       stats_4.4.2      
  tinytex_0.57      tools_4.4.2       utils_4.4.2      
  xfun_0.52         yaml_2.3.10      

Pandoc version: 3.4

Why is knitr acting this way, and how can I fix it (if possible) so that I get the same behaviour whether I click "knit" on the child document of parent document?


Solution

  • Edited: after much thinking, the behavior makes sense and is mostly unavoidable. The use of transform=neglog_trans() is not only canonical, it is unambiguous with no loss of generality or functionality. In contrast, the use of strings ala transform="neglog" is a convenience only, and it includes added cost (to find the real function) and a loss of functionality.

    (This discussion is well-informed by https://adv-r.hadley.nz/environments.html.)

    The issue is with how environments are hierarchically searched and how R will search through them to find an object. Rendering/knitting a document uses a temporary environment which is a child and not a parent of the working (typically global) env. When scales::as.transform is called (by ggplot2-code here), its apparent search path is unambiguous. Starting from a temp environment (how knitr works), I defined your gg1 and your function, and then

    neglog_trans
    # function(base=10){
    #   trans <- \(x) -logb(x,base=base)
    #   inv <- \(x) base^(-x)
    #   scales::trans_new("neglog", transform = trans, inverse = inv)
    # }
    # <environment: 0x12616e190>   # <--- our temporary knitting environment
    
    debugonce(scales::as.transform)
    gg1 + scale_x_continuous(transform = "neglog")
    Browse[2]> # in the R debugger
    
    environment()
    # <environment: 0x1167598e0>   # <--- not in our temporary knitting environment
    
    match.call()
    # as.transform(x = transform)
    
    rlang::search_envs()
    #  [[1]] $ <env: global>
    #  [[2]] $ <env: package:ggplot2>
    #  [[3]] $ <env: ESSR>
    #  [[4]] $ <env: package:stats>
    #  [[5]] $ <env: package:graphics>
    #  [[6]] $ <env: package:grDevices>
    #  [[7]] $ <env: package:utils>
    #  [[8]] $ <env: package:datasets>
    #  [[9]] $ <env: package:r2>
    # [[10]] $ <env: package:methods>
    # [[11]] $ <env: Autoloads>
    # [[12]] $ <env: package:base>
    

    (r2 is my personal package of utilities. ESSR is because I use emacs/ESS instead of RStudio.) Notice that the (temporary) working environment 0x12616e190 is not in its search environment.

    This is not under the control of scales or knitr, they do not control the search path nor how objects/functions are "found". There are two ways that as.transform would be able to find your neglog_trans:

    1. If scales knows the temp-environment. For this to happen, we'd need to pass that argument up the chain, meaning pass it to scales_x_continuous(transform="neglog", working_env=...) (clearly this does not work) so that it can pass it to as.transform() so that its get0(f2, mode="function") could find neglog_trans. Asking other packages' functions to pass around arbitrary search paths does not seem practical.

    2. If neglog_trans is defined in one of the environments/namespaces that scales knows. For example, if neglog_trans is defined in my globalenv, then the temporary-knitting environment sees it and so does scales. In this case, you should be able to use transform="neglog" without a problem.

      This works, but in general it is much preferred for an rmarkdown document to access objects that are (1) created within the document, or (2) explicitly passed via its params: yaml portion (see https://bookdown.org/yihui/rmarkdown-cookbook/parameterized-reports.html).

    3. If the object passed to transform= were the actual function, therefore "no searching required".

      The use of a string is a convenience: it provides no added functionality, it has no added generality. The only thing using strings gives you is fewer keystrokes (and perhaps aesthetically if you prefer strings as arguments). It could be analogous to passing a string "y" to denote the logical TRUE as a convenience object: it is fewer characters, but passing the object itself is more direct.

    Many people may intentionally or unintentionally take the path of option #2, though relying on the global environment for objects when knitting can be risky and result in non-reproducible reports.

    Option #3 is the canonical approach. To demonstrate why this is canonical and in fact more powerful than using "neglog", what happens if you want to use your neglog function with base=2?

    So the "you" that first defined neglog_trans() actually counted on this ability to change the base= at render-time by making it a parameterized function. I think you owe past-you some gratitude, you did it well.

    In the end, my recommendation is to change from the "convenient" use of strings to the unambiguous function. It works, it will always work, and it enables an argument that you yourself programmed into the function itself (which strings would not allow).

    gg1 +
      scale_x_continuous(
        transform = neglog_trans()
      ) +
      scale_y_continuous(
        transform = neglog_trans()
      )
    # plot rendered, no error