rknitrknitr-spin

R knitr: use spin() with R and Python code


With the advent of reticulate, combining R and Python in a single .Rmd document has become increasingly popular among the R community (myself included). Now, my personal workflow usually starts with an R script and, at some point, I create a shareable report using knitr::spin() with the plain .R document as input in order to avoid code duplication (see also Knitr's best hidden gem: spin for more on the topic).

However, as soon as Python code is involved in my analysis, I am currently forced to break this workflow and manually convert (ie. copy and paste) my initial .R script into .Rmd before compiling the report. I wonder, does anybody know whether it is – or for that matter, will ever be – possible to make knitr::spin() work with both R and Python code chunks in a single .R file without taking this detour? I mean, just like it works when mixing the two languages, and exchanging objects between them, in a .Rmd file. There is, at least to the best of my knowledge, no possibility to add something like engine = 'python' to spin documents at the moment.


Solution

  • Use of reticulate::source_python could be one solution.

    For example, here is a simple .R script which will be spun to .Rmd and then rendered to .html

    spin-me.R

    #'---
    #'title: R and Python in a spin file.
    #'---
    #'
    #' This is an example of one way to write one R script, containing both R and
    #' python, and can be spun to .Rmd via knitr::spin.
    #'
    #+ label = "setup"
    library(nycflights13)
    library(ggplot2)
    library(reticulate)
    use_condaenv()
    
    #'
    #' Create the file flights.csv to
    #'
    #+ label = "create_flights_csv"
    write.csv(flights, file = "flights.csv")
    
    #'
    #' The file flights.py will read in the data from the flights.csv file.  It can
    #' be evaluated in this script via source_python().  This sould add a data.frame
    #' called `py_flights` to the workspace.
    source_python(file = "flights.py")
    
    #'
    #' And now, plot the results.
    #'
    #+ label = "plot"
    ggplot(py_flights) + aes(carrier, arr_delay) + geom_point() + geom_jitter()
    
    
    # /* spin and knit this file to html
    knitr::spin(hair = "spin-me.R", knit = FALSE)
    rmarkdown::render("spin-me.Rmd")
    # */
    

    The python file is

    flights.py

    import pandas
    py_flights = pandas.read_csv("flights.csv")
    py_flights = py_flights[py_flights['dest'] == "ORD"]
    py_flights = py_flights[['carrier', 'dep_delay', 'arr_delay']]
    py_flights = py_flights.dropna()
    

    And a screen capture of the resulting .html is:

    enter image description here

    EDIT If keeping everything in one file is a must, then before the source_python call you could create a python file, e.g.,

    pycode <-
    'import pandas
    py_flights = pandas.read_csv("flights.csv")
    py_flights = py_flights[py_flights["dest"] == "ORD"]
    py_flights = py_flights[["carrier", "dep_delay", "arr_delay"]]
    py_flights = py_flights.dropna()
    '
    cat(pycode, file = "temp.py")
    source_python(file = "temp.py")
    

    My opinion: having the python code in its own file would be preferable to having it created in the R script for two reasons:

    1. Easier reuse of the python code
    2. Syntax highlighting in my IDE is lost for the python code when written as a string an not in its own file.