I have a large R-dataframe with some columns that contain strings from a rich text in markdown format. I need to convert these from markdown to html, with R.
I don't want to render it, I don't want any files read or written, I just want to convert fields in a vector or dataframe, so another program can render the html at some time.
I searched a lot on this, but I only seem to be able to find methods that work via files: they read markdown files and convert those, then write to html files. My input is simply a dataframe; if it works via files I'm afraid that it will cost a lot of running time.
I tried commonmark::markdown_html(df$col1)
, as well as markdown::markdownToHTML(text=df$col1, fragment.only = TRUE)
, but both methods make a mess. That is because of this clause in the specification of markdownToHTML
:
text option: a character vector containing the markdown text to transform (each element of this vector is treated as a line in a file).
The latter is the problem: it causes that every element in my vector is filled with a huge html which contains all content of all rows or elements, and where every element/row is the same.
Try converting item by item, not a whole vector of Markdown strings in one call.
With pandoc
& purrr
it might look something like this:
library(dplyr)
library(purrr)
library(stringr)
library(pandoc)
tibble(md = c("## Lorem ipsum dolor sit amet",
"*cubilia* _nec_ **arcu auctor**",
"* quis \n* ligula \n* sapien \n")
) |>
mutate(html = map_chr(md, \(x) pandoc_convert(text = x, to = "html") |> str_flatten()))
#> # A tibble: 3 × 2
#> md html
#> <chr> <chr>
#> 1 "## Lorem ipsum dolor sit amet" "<h2 id=\"lorem-ipsum-dolor-sit-amet\">Lorem ipsum dolor sit amet</h2>"
#> 2 "*cubilia* _nec_ **arcu auctor**" "<p><em>cubilia</em> <em>nec</em> <strong>arcu auctor</strong></p>"
#> 3 "* quis \n* ligula \n* sapien \n" "<ul><li>quis<br /></li><li>ligula<br /></li><li>sapien</li></ul>"
You probably need to adjust flattening according to your input.