htmlrvectortype-conversionmarkdown

R: how to convert markdown to html in a vector which is NOT a file


I have a large R-dataframe with some columns that contain strings from a rich text in markdown format. I need to convert these from markdown to html, with R.

I don't want to render it, I don't want any files read or written, I just want to convert fields in a vector or dataframe, so another program can render the html at some time.

I searched a lot on this, but I only seem to be able to find methods that work via files: they read markdown files and convert those, then write to html files. My input is simply a dataframe; if it works via files I'm afraid that it will cost a lot of running time.

I tried commonmark::markdown_html(df$col1), as well as markdown::markdownToHTML(text=df$col1, fragment.only = TRUE), but both methods make a mess. That is because of this clause in the specification of markdownToHTML:

text option: a character vector containing the markdown text to transform (each element of this vector is treated as a line in a file).

The latter is the problem: it causes that every element in my vector is filled with a huge html which contains all content of all rows or elements, and where every element/row is the same.


Solution

  • Try converting item by item, not a whole vector of Markdown strings in one call.
    With pandoc & purrr it might look something like this:

    library(dplyr)
    library(purrr)
    library(stringr)
    
    library(pandoc)
    
    tibble(md = c("## Lorem ipsum dolor sit amet",
                  "*cubilia* _nec_ **arcu auctor**", 
                  "* quis  \n* ligula  \n* sapien  \n")
           ) |> 
      mutate(html = map_chr(md, \(x) pandoc_convert(text = x, to = "html") |> str_flatten()))
    #> # A tibble: 3 × 2
    #>   md                                   html                                                                   
    #>   <chr>                                <chr>                                                                  
    #> 1 "## Lorem ipsum dolor sit amet"      "<h2 id=\"lorem-ipsum-dolor-sit-amet\">Lorem ipsum dolor sit amet</h2>"
    #> 2 "*cubilia* _nec_ **arcu auctor**"    "<p><em>cubilia</em> <em>nec</em> <strong>arcu auctor</strong></p>"    
    #> 3 "* quis  \n* ligula  \n* sapien  \n" "<ul><li>quis<br /></li><li>ligula<br /></li><li>sapien</li></ul>"
    

    You probably need to adjust flattening according to your input.