With the advent of reticulate, combining R and Python in a single .Rmd document has become increasingly popular among the R community (myself included). Now, my personal workflow usually starts with an R script and, at some point, I create a shareable report using knitr::spin()
with the plain .R document as input in order to avoid code duplication (see also Knitr's best hidden gem: spin for more on the topic).
However, as soon as Python code is involved in my analysis, I am currently forced to break this workflow and manually convert (ie. copy and paste) my initial .R script into .Rmd before compiling the report. I wonder, does anybody know whether it is – or for that matter, will ever be – possible to make knitr::spin()
work with both R and Python code chunks in a single .R file without taking this detour? I mean, just like it works when mixing the two languages, and exchanging objects between them, in a .Rmd file. There is, at least to the best of my knowledge, no possibility to add something like engine = 'python'
to spin documents at the moment.
Use of reticulate::source_python
could be one solution.
For example, here is a simple .R script which will be spun to .Rmd and then rendered to .html
spin-me.R
#'---
#'title: R and Python in a spin file.
#'---
#'
#' This is an example of one way to write one R script, containing both R and
#' python, and can be spun to .Rmd via knitr::spin.
#'
#+ label = "setup"
library(nycflights13)
library(ggplot2)
library(reticulate)
use_condaenv()
#'
#' Create the file flights.csv to
#'
#+ label = "create_flights_csv"
write.csv(flights, file = "flights.csv")
#'
#' The file flights.py will read in the data from the flights.csv file. It can
#' be evaluated in this script via source_python(). This sould add a data.frame
#' called `py_flights` to the workspace.
source_python(file = "flights.py")
#'
#' And now, plot the results.
#'
#+ label = "plot"
ggplot(py_flights) + aes(carrier, arr_delay) + geom_point() + geom_jitter()
# /* spin and knit this file to html
knitr::spin(hair = "spin-me.R", knit = FALSE)
rmarkdown::render("spin-me.Rmd")
# */
The python file is
flights.py
import pandas
py_flights = pandas.read_csv("flights.csv")
py_flights = py_flights[py_flights['dest'] == "ORD"]
py_flights = py_flights[['carrier', 'dep_delay', 'arr_delay']]
py_flights = py_flights.dropna()
And a screen capture of the resulting .html is:
EDIT If keeping everything in one file is a must, then before the source_python
call you could create a python file, e.g.,
pycode <-
'import pandas
py_flights = pandas.read_csv("flights.csv")
py_flights = py_flights[py_flights["dest"] == "ORD"]
py_flights = py_flights[["carrier", "dep_delay", "arr_delay"]]
py_flights = py_flights.dropna()
'
cat(pycode, file = "temp.py")
source_python(file = "temp.py")
My opinion: having the python code in its own file would be preferable to having it created in the R script for two reasons: